Index: head/share/man/man4/ddb.4 =================================================================== --- head/share/man/man4/ddb.4 (revision 180357) +++ head/share/man/man4/ddb.4 (revision 180358) @@ -1,1403 +1,1410 @@ .\" .\" Mach Operating System .\" Copyright (c) 1991,1990 Carnegie Mellon University .\" Copyright (c) 2007 Robert N. M. Watson .\" All Rights Reserved. .\" .\" Permission to use, copy, modify and distribute this software and its .\" documentation is hereby granted, provided that both the copyright .\" notice and this permission notice appear in all copies of the .\" software, derivative works or modified versions, and any portions .\" thereof, and that both notices appear in supporting documentation. .\" .\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" .\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR .\" ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. .\" .\" Carnegie Mellon requests users of this software to return to .\" .\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU .\" School of Computer Science .\" Carnegie Mellon University .\" Pittsburgh PA 15213-3890 .\" .\" any improvements or extensions that they make and grant Carnegie Mellon .\" the rights to redistribute these changes. .\" .\" changed a \# to #, since groff choked on it. .\" .\" HISTORY .\" ddb.4,v .\" Revision 1.1 1993/07/15 18:41:02 brezak .\" Man page for DDB .\" .\" Revision 2.6 92/04/08 08:52:57 rpd .\" Changes from OSF. .\" [92/01/17 14:19:22 jsb] .\" Changes for OSF debugger modifications. .\" [91/12/12 tak] .\" .\" Revision 2.5 91/06/25 13:50:22 rpd .\" Added some watchpoint explanation. .\" [91/06/25 rpd] .\" .\" Revision 2.4 91/06/17 15:47:31 jsb .\" Added documentation for continue/c, match, search, and watchpoints. .\" I've not actually explained what a watchpoint is; maybe Rich can .\" do that (hint, hint). .\" [91/06/17 10:58:08 jsb] .\" .\" Revision 2.3 91/05/14 17:04:23 mrt .\" Correcting copyright .\" .\" Revision 2.2 91/02/14 14:10:06 mrt .\" Changed to new Mach copyright .\" [91/02/12 18:10:12 mrt] .\" .\" Revision 2.2 90/08/30 14:23:15 dbg .\" Created. .\" [90/08/30 dbg] .\" .\" $FreeBSD$ .\" -.Dd June 8, 2008 +.Dd July 7, 2008 .Dt DDB 4 .Os .Sh NAME .Nm ddb .Nd interactive kernel debugger .Sh SYNOPSIS In order to enable kernel debugging facilities include: .Bd -ragged -offset indent .Cd options KDB .Cd options DDB .Ed .Pp To prevent activation of the debugger on kernel .Xr panic 9 : .Bd -ragged -offset indent .Cd options KDB_UNATTENDED .Ed .Pp In order to print a stack trace of the current thread on the console for a panic: .Bd -ragged -offset indent .Cd options KDB_TRACE .Ed .Pp To print the numerical value of symbols in addition to the symbolic representation, define: .Bd -ragged -offset indent .Cd options DDB_NUMSYM .Ed .Pp To enable the .Xr gdb 1 backend, so that remote debugging with .Xr kgdb 1 is possible, include: .Bd -ragged -offset indent .Cd options GDB .Ed .Sh DESCRIPTION The .Nm kernel debugger has most of the features of the old .Nm kdb , but with a more rational syntax inspired by .Xr gdb 1 . If linked into the running kernel, it can be invoked locally with the .Ql debug .Xr keymap 5 action. The debugger is also invoked on kernel .Xr panic 9 if the .Va debug.debugger_on_panic .Xr sysctl 8 MIB variable is set non-zero, which is the default unless the .Dv KDB_UNATTENDED option is specified. .Pp The current location is called .Va dot . The .Va dot is displayed with a hexadecimal format at a prompt. The commands .Ic examine and .Ic write update .Va dot to the address of the last line examined or the last location modified, and set .Va next to the address of the next location to be examined or changed. Other commands do not change .Va dot , and set .Va next to be the same as .Va dot . .Pp The general command syntax is: .Ar command Ns Op Li / Ns Ar modifier .Ar address Ns Op Li , Ns Ar count .Pp A blank line repeats the previous command from the address .Va next with count 1 and no modifiers. Specifying .Ar address sets .Va dot to the address. Omitting .Ar address uses .Va dot . A missing .Ar count is taken to be 1 for printing commands or infinity for stack traces. .Pp The .Nm debugger has a pager feature (like the .Xr more 1 command) for the output. If an output line exceeds the number set in the .Va lines variable, it displays .Dq Li --More-- and waits for a response. The valid responses for it are: .Pp .Bl -tag -compact -width ".Li SPC" .It Li SPC one more page .It Li RET one more line .It Li q abort the current command, and return to the command input mode .El .Pp Finally, .Nm provides a small (currently 10 items) command history, and offers simple .Nm emacs Ns -style command line editing capabilities. In addition to the .Nm emacs control keys, the usual .Tn ANSI arrow keys might be used to browse through the history buffer, and move the cursor within the current line. .Sh COMMANDS .Bl -tag -width indent -compact .It Ic examine .It Ic x Display the addressed locations according to the formats in the modifier. Multiple modifier formats display multiple locations. If no format is specified, the last format specified for this command is used. .Pp The format characters are: .Bl -tag -compact -width indent .It Cm b look at by bytes (8 bits) .It Cm h look at by half words (16 bits) .It Cm l look at by long words (32 bits) .It Cm a print the location being displayed .It Cm A print the location with a line number if possible .It Cm x display in unsigned hex .It Cm z display in signed hex .It Cm o display in unsigned octal .It Cm d display in signed decimal .It Cm u display in unsigned decimal .It Cm r display in current radix, signed .It Cm c display low 8 bits as a character. Non-printing characters are displayed as an octal escape code (e.g., .Ql \e000 ) . .It Cm s display the null-terminated string at the location. Non-printing characters are displayed as octal escapes. .It Cm m display in unsigned hex with character dump at the end of each line. The location is also displayed in hex at the beginning of each line. .It Cm i display as an instruction .It Cm I display as an instruction with possible alternate formats depending on the machine: .Bl -tag -width ".Tn powerpc" -compact .It Tn alpha Show the registers of the instruction. .It Tn amd64 No alternate format. .It Tn i386 No alternate format. .It Tn ia64 No alternate format. .It Tn powerpc No alternate format. .It Tn sparc64 No alternate format. .El .It Cm S display a symbol name for the pointer stored at the address .El .Pp .It Ic xf Examine forward: execute an .Ic examine command with the last specified parameters to it except that the next address displayed by it is used as the start address. .Pp .It Ic xb Examine backward: execute an .Ic examine command with the last specified parameters to it except that the last start address subtracted by the size displayed by it is used as the start address. .Pp .It Ic print Ns Op Li / Ns Cm acdoruxz .It Ic p Ns Op Li / Ns Cm acdoruxz Print .Ar addr Ns s according to the modifier character (as described above for .Cm examine ) . Valid formats are: .Cm a , x , z , o , d , u , r , and .Cm c . If no modifier is specified, the last one specified to it is used. The argument .Ar addr can be a string, in which case it is printed as it is. For example: .Bd -literal -offset indent print/x "eax = " $eax "\enecx = " $ecx "\en" .Ed .Pp will print like: .Bd -literal -offset indent eax = xxxxxx ecx = yyyyyy .Ed .Pp .It Xo .Ic write Ns Op Li / Ns Cm bhl .Ar addr expr1 Op Ar expr2 ... .Xc .It Xo .Ic w Ns Op Li / Ns Cm bhl .Ar addr expr1 Op Ar expr2 ... .Xc Write the expressions specified after .Ar addr on the command line at succeeding locations starting with .Ar addr . The write unit size can be specified in the modifier with a letter .Cm b (byte), .Cm h (half word) or .Cm l (long word) respectively. If omitted, long word is assumed. .Pp .Sy Warning : since there is no delimiter between expressions, strange things may happen. It is best to enclose each expression in parentheses. .Pp .It Ic set Li $ Ns Ar variable Oo Li = Oc Ar expr Set the named variable or register with the value of .Ar expr . Valid variable names are described below. .Pp .It Ic break Ns Op Li / Ns Cm u .It Ic b Ns Op Li / Ns Cm u Set a break point at .Ar addr . If .Ar count is supplied, continues .Ar count \- 1 times before stopping at the break point. If the break point is set, a break point number is printed with .Ql # . This number can be used in deleting the break point or adding conditions to it. .Pp If the .Cm u modifier is specified, this command sets a break point in user address space. Without the .Cm u option, the address is considered to be in the kernel space, and a wrong space address is rejected with an error message. This modifier can be used only if it is supported by machine dependent routines. .Pp .Sy Warning : If a user text is shadowed by a normal user space debugger, user space break points may not work correctly. Setting a break point at the low-level code paths may also cause strange behavior. .Pp .It Ic delete Ar addr .It Ic d Ar addr .It Ic delete Li # Ns Ar number .It Ic d Li # Ns Ar number Delete the break point. The target break point can be specified by a break point number with .Ql # , or by using the same .Ar addr specified in the original .Ic break command. .Pp .It Ic watch Ar addr Ns Li , Ns Ar size Set a watchpoint for a region. Execution stops when an attempt to modify the region occurs. The .Ar size argument defaults to 4. If you specify a wrong space address, the request is rejected with an error message. .Pp .Sy Warning : Attempts to watch wired kernel memory may cause unrecoverable error in some systems such as i386. Watchpoints on user addresses work best. .Pp .It Ic hwatch Ar addr Ns Li , Ns Ar size Set a hardware watchpoint for a region if supported by the architecture. Execution stops when an attempt to modify the region occurs. The .Ar size argument defaults to 4. .Pp .Sy Warning : The hardware debug facilities do not have a concept of separate address spaces like the watch command does. Use .Ic hwatch for setting watchpoints on kernel address locations only, and avoid its use on user mode address spaces. .Pp .It Ic dhwatch Ar addr Ns Li , Ns Ar size Delete specified hardware watchpoint. .Pp .It Ic step Ns Op Li / Ns Cm p .It Ic s Ns Op Li / Ns Cm p Single step .Ar count times (the comma is a mandatory part of the syntax). If the .Cm p modifier is specified, print each instruction at each step. Otherwise, only print the last instruction. .Pp .Sy Warning : depending on machine type, it may not be possible to single-step through some low-level code paths or user space code. On machines with software-emulated single-stepping (e.g., pmax), stepping through code executed by interrupt handlers will probably do the wrong thing. .Pp .It Ic continue Ns Op Li / Ns Cm c .It Ic c Ns Op Li / Ns Cm c Continue execution until a breakpoint or watchpoint. If the .Cm c modifier is specified, count instructions while executing. Some machines (e.g., pmax) also count loads and stores. .Pp .Sy Warning : when counting, the debugger is really silently single-stepping. This means that single-stepping on low-level code may cause strange behavior. .Pp .It Ic until Ns Op Li / Ns Cm p Stop at the next call or return instruction. If the .Cm p modifier is specified, print the call nesting depth and the cumulative instruction count at each call or return. Otherwise, only print when the matching return is hit. .Pp .It Ic next Ns Op Li / Ns Cm p .It Ic match Ns Op Li / Ns Cm p Stop at the matching return instruction. If the .Cm p modifier is specified, print the call nesting depth and the cumulative instruction count at each call or return. Otherwise, only print when the matching return is hit. .Pp .It Xo .Ic trace Ns Op Li / Ns Cm u .Op Ar pid | tid .Op Li , Ns Ar count .Xc .It Xo .Ic t Ns Op Li / Ns Cm u .Op Ar pid | tid .Op Li , Ns Ar count .Xc .It Xo .Ic where Ns Op Li / Ns Cm u .Op Ar pid | tid .Op Li , Ns Ar count .Xc .It Xo .Ic bt Ns Op Li / Ns Cm u .Op Ar pid | tid .Op Li , Ns Ar count .Xc Stack trace. The .Cm u option traces user space; if omitted, .Ic trace only traces kernel space. The optional argument .Ar count is the number of frames to be traced. If .Ar count is omitted, all frames are printed. .Pp .Sy Warning : User space stack trace is valid only if the machine dependent code supports it. .Pp .It Xo .Ic search Ns Op Li / Ns Cm bhl .Ar addr .Ar value .Op Ar mask .Op Li , Ns Ar count .Xc Search memory for .Ar value . This command might fail in interesting ways if it does not find the searched-for value. This is because .Nm does not always recover from touching bad memory. The optional .Ar count argument limits the search. .\" .Pp .It Ic show Cm all procs Ns Op Li / Ns Cm m .It Ic ps Ns Op Li / Ns Cm m Display all process information. The process information may not be shown if it is not supported in the machine, or the bottom of the stack of the target process is not in the main memory at that time. The .Cm m modifier will alter the display to show VM map addresses for the process and not show other information. .\" .Pp .It Ic show Cm allchains Show the same information like "show lockchain" does, but for every thread in the system. .\" .Pp .It Ic show Cm alllocks Show all locks that are currently held. .\" .Pp .It Ic show Cm allpcpu The same as "show pcpu", but for every CPU present in the system. .\" .Pp .It Ic show Cm allrman Show information related with resource management, including interrupt request lines, DMA request lines, I/O ports and I/O memory addresses. .\" .Pp .It Ic show Cm apic Dump data about APIC IDT vector mappings. .\" .Pp .It Ic show Cm breaks Show breakpoints set with the "break" command. .\" .Pp .It Ic show Cm buffer Show buffer structure of .Vt struct buf type. Such a structure is used within the .Fx kernel for the I/O subsystem implementation. For an exact interpretation of the output, please see the .Pa sys/buf.h header file. .\" .Pp .It Ic show Cm cbstat Show brief information about the TTY subsystem. +.\" +.Pp +.It Ic show Cm cpusets +Print numbered root and assigned CPU affinity sets. +See +.Xr cpuset 2 +for more details. .\" .Pp .It Ic show Cm cyrixreg Show registers specific to the Cyrix processor. .\" .Pp .It Ic show Cm domain Ar addr Print protocol domain structure .Vt struct domain at address .Ar addr . See the .Pa sys/domain.h header file for more details on the exact meaning of the structure fields. .\" .Pp .It Ic show Cm file Ar addr Show information about the file structure .Vt struct file present at address .Ar addr . .\" .Pp .It Ic show Cm files Show information about every file structure in the system. .\" .Pp .It Ic show Cm freepages Show the number of physical pages in each of the free lists. .\" .Pp .It Ic show Cm geom Op Ar addr If the .Ar addr argument is not given, displays the entire GEOM topology. If .Ar addr is given, displays details about the given GEOM object (class, geom, provider or consumer). .\" .Pp .It Ic show Cm idt Show IDT layout. The first column specifies the IDT vector. The second one is the name of the interrupt/trap handler. Those functions are machine dependent. .\" .Pp .It Ic show Cm inpcb Ar addr Show information on IP Control Block .Vt struct in_pcb present at .Ar addr . .\" .Pp .It Ic show Cm intr Dump information about interrupt handlers. .\" .Pp .It Ic show Cm intrcnt Dump the interrupt statistics. .\" .Pp .It Ic show Cm irqs Show interrupt lines and their respective kernel threads. .\" .Pp .It Ic show Cm lapic Show information from the local APIC registers for this CPU. .\" .Pp .It Ic show Cm lock Ar addr Show lock structure. The output format is as follows: .Bl -tag -offset 0 -width "flags" .It Ic class: Class of the lock. Possible types include .Xr mutex 9 , .Xr rmlock 9 , .Xr rwlock 9 , .Xr sx 9 . .It Ic name: Name of the lock. .It Ic flags: Flags passed to the lock initialization function. For exact possibilities see manual pages of possible lock types. .It Ic state: Current state of a lock. As well as .Ic flags it's lock-specific. .It Ic owner: Lock owner. .El .\" .Pp .It Ic show Cm lockchain Ar addr Show all threads a particular thread at address .Ar addr is waiting on based on non-sleepable and non-spin locks. .\" .Pp .It Ic show Cm lockedbufs Show the same information as "show buf", but for every locked .Vt struct buf object. .\" .Pp .It Ic show Cm lockedvnods List all locked vnodes in the system. .\" .Pp .It Ic show Cm locks Prints all locks that are currently acquired. .\" .Pp .It Ic show Cm locktree .\" .Pp .It Ic show Cm malloc Prints .Xr malloc 9 memory allocator statistics. The output format is as follows: .Pp .Bl -tag -compact -offset indent -width "Requests" .It Ic Type Specifies a type of memory. It is the same as a description string used while defining the given memory type with .Xr MALLOC_DECLARE 9 . .It Ic InUse Number of memory allocations of the given type, for which .Xr free 9 has not been called yet. .It Ic MemUse Total memory consumed by the given allocation type. .It Ic Requests Number of memory allocation requests for the given memory type. .El .Pp The same information can be gathered in userspace with .Dq Nm vmstat Fl m . .\" .Pp .It Ic show Cm map Ns Oo Li / Ns Cm f Oc Ar addr Prints the VM map at .Ar addr . If the .Cm f modifier is specified the complete map is printed. .\" .Pp .It Ic show Cm msgbuf Print the system's message buffer. It is the same output as in the .Dq Nm dmesg case. It is useful if you got a kernel panic, attached a serial cable to the machine and want to get the boot messages from before the system hang. .\" .It Ic show Cm mount Displays short info about all currently mounted file systems. .Pp .It Ic show Cm mount Ar addr Displays details about the given mount point. .Pp .\" .Pp .It Ic show Cm object Ns Oo Li / Ns Cm f Oc Ar addr Prints the VM object at .Ar addr . If the .Cm f option is specified the complete object is printed. .\" .Pp .It Ic show Cm page Show statistics on VM pages. .\" .Pp .It Ic show Cm pageq Show statistics on VM page queues. .\" .Pp .It Ic show Cm pciregs Print PCI bus registers. The same information can be gathered in userspace by running .Dq Nm pciconf Fl lv . .\" .Pp .It Ic show Cm pcpu Print current processor state. The output format is as follows: .Pp .Bl -tag -compact -offset indent -width "spin locks held:" .It Ic cpuid Processor identifier. .It Ic curthread Thread pointer, process identifier and the name of the process. .It Ic curpcb Control block pointer. .It Ic fpcurthread FPU thread pointer. .It Ic idlethread Idle thread pointer. .It Ic APIC ID CPU identifier coming from APIC. .It Ic currentldt LDT pointer. .It Ic spin locks held Names of spin locks held. .El .\" .Pp .It Ic show Cm pgrpdump Dump process groups present within the system. .\" .Pp .It Ic show Cm proc Op Ar addr If no .Op Ar addr is specified, print information about the current process. Otherwise, show information about the process at address .Ar addr . .\" .Pp .It Ic show Cm procvm Show process virtual memory layout. .\" .Pp .It Ic show Cm protosw Ar addr Print protocol switch structure .Vt struct protosw at address .Ar addr . .\" .Pp .It Ic show Cm registers Ns Op Li / Ns Cm u Display the register set. If the .Cm u modifier is specified, it displays user registers instead of kernel registers or the currently saved one. .Pp .Sy Warning : The support of the .Cm u modifier depends on the machine. If not supported, incorrect information will be displayed. .\" .Pp .It Ic show Cm rman Ar addr Show resource manager object .Vt struct rman at address .Ar addr . Addresses of particular pointers can be gathered with "show allrman" command. .\" .Pp .It Ic show Cm rtc Show real time clock value. Useful for long debugging sessions. .\" .Pp .It Ic show Cm sleepchain Show all the threads a particular thread is waiting on based on sleepable locks. .\" .Pp .It Ic show Cm sleepq .It Ic show Cm sleepqueue Both commands provide the same functionality. They show sleepqueue .Vt struct sleepqueue structure. Sleepqueues are used within the .Fx kernel to implement sleepable synchronization primitives (thread holding a lock might sleep or be context switched), which at the time of writing are: .Xr condvar 9 , .Xr sx 9 and standard .Xr msleep 9 interface. .\" .Pp .It Ic show Cm sockbuf Ar addr .It Ic show Cm socket Ar addr Those commands print .Vt struct sockbuf and .Vt struct socket objects placed at .Ar addr . Output consists of all values present in structures mentioned. For exact interpretation and more details, visit .Pa sys/socket.h header file. .\" .Pp .It Ic show Cm sysregs Show system registers (e.g., .Li cr0-4 on i386.) Not present on some platforms. .\" .Pp .It Ic show Cm tcpcb Ar addr Print TCP control block .Vt struct tcpcb lying at address .Ar addr . For exact interpretation of output, visit .Pa netinet/tcp.h header file. .\" .Pp .It Ic show Cm thread Op Ar addr If no .Ar addr is specified, show detailed information about current thread. Otherwise, information about thread at .Ar addr is printed. .\" .Pp .It Ic show Cm threads Show all threads within the system. Output format is as follows: .Pp .Bl -tag -width "PPID" -compact -offset indent -width "Second column" .It Ic First column Thread identifier (TID) .It Ic Second column Thread structure address .It Ic Third column Backtrace. .El .\" .Pp .It Ic show Cm turnstile Ar addr Show turnstile .Vt struct turnstile structure at address .Ar addr . Turnstiles are structures used within the .Fx kernel to implement synchronization primitives which, while holding a specific type of lock, cannot sleep or context switch to another thread. Currently, those are: .Xr mutex 9 , .Xr rwlock 9 , .Xr rmlock 9 . .\" .Pp .It Ic show Cm uma Show UMA allocator statistics. Output consists five columns: .Pp .Bl -tag -compact -offset indent -width "Requests" .It Cm "Zone" Name of the UMA zone. The same string that was passed to .Xr uma_zcreate 9 as a first argument. .It Cm "Size" Size of a given memory object (slab). .It Cm "Used" Number of slabs being currently used. .It Cm "Free" Number of free slabs within the UMA zone. .It Cm "Requests" Number of allocations requests to the given zone. .El .Pp The very same information might be gathered in the userspace with the help of .Dq Nm vmstat Fl z .\" .Pp .It Ic show Cm unpcb Ar addr Shows UNIX domain socket private control block .Vt struct unpcb present at the address .Ar addr .\" .Pp .It Ic show Cm vmochk Prints, whether the internal VM objects are in a map somewhere and none have zero ref counts. .\" .Pp .It Ic show Cm vmopag This is supposed to show physical addresses consumed by a VM object. Currently, it is not possible to use this command when .Xr witness 9 is compiled in the kernel. .\" .Pp .It Ic show Cm vnode Op Ar addr Prints vnode .Vt struct vnode structure lying at .Op Ar addr . For the exact interpretation of the output, look at the .Pa sys/vnode.h header file. .\" .Pp .It Ic show Cm watches Displays all watchpoints. Shows watchpoints set with "watch" command. .\" .Pp .It Ic show Cm witness Shows information about lock acquisition coming from the .Xr witness 9 subsystem. .\" .Pp .It Ic gdb Toggles between remote GDB and DDB mode. In remote GDB mode, another machine is required that runs .Xr gdb 1 using the remote debug feature, with a connection to the serial console port on the target machine. Currently only available on the i386 architecture. .Pp .It Ic halt Halt the system. .Pp .It Ic kill Ar sig pid Send signal .Ar sig to process .Ar pid . The signal is acted on upon returning from the debugger. This command can be used to kill a process causing resource contention in the case of a hung system. See .Xr signal 3 for a list of signals. Note that the arguments are reversed relative to .Xr kill 2 . .Pp .It Ic reboot .It Ic reset Hard reset the system. .Pp .It Ic help Print a short summary of the available commands and command abbreviations. .Pp .It Ic capture on .It Ic capture off .It Ic capture reset .It Ic capture status .Nm supports a basic output capture facility, which can be used to retrieve the results of debugging commands from userpsace using .Xr sysctl 2 . .Ic capture on enables output capture; .Ic capture off disables capture. .Ic capture reset will clear the capture buffer and disable capture. .Ic capture status will report current buffer use, buffer size, and disposition of output capture. .Pp Userspace processes may inspect and manage .Nm capture state using .Xr sysctl 8 : .Pp .Dv debug.ddb.capture.bufsize may be used to query or set the current capture buffer size. .Pp .Dv debug.ddb.capture.maxbufsize may be used to query the compile-time limit on the capture buffer size. .Pp .Dv debug.ddb.capture.bytes may be used to query the number of bytes of output currently in the capture buffer. .Pp .Dv debug.ddb.capture.data returns the contents of the buffer as a string to an appropriately privileged process. .Pp This facility is particularly useful in concert with the scripting and .Xr textdump 4 facilities, allowing scripted debugging output to be captured and committed to disk as part of a textdump for later analysis. The contents of the capture buffer may also be inspected in a kernel core dump using .Xr kgdb 1 . .Pp .It Ic run .It Ic script .It Ic scripts .It Ic unscript Run, define, list, and delete scripts. See the .Sx SCRIPTING section for more information on the scripting facility. .Pp .It Ic textdump set .It Ic textdump status .It Ic textdump unset The .Ic textdump set command may be used to force the next kernel core dump to be a textdump rather than a traditional memory dump or minidump. .Ic textdump status reports whether a textdump has been scheduled. .Ic textdump unset cancels a request to perform a textdump as the next kernel core dump. More information may be found in .Xr textdump 4 . .El .Sh VARIABLES The debugger accesses registers and variables as .Li $ Ns Ar name . Register names are as in the .Dq Ic show Cm registers command. Some variables are suffixed with numbers, and may have some modifier following a colon immediately after the variable name. For example, register variables can have a .Cm u modifier to indicate user register (e.g., .Dq Li $eax:u ) . .Pp Built-in variables currently supported are: .Pp .Bl -tag -width ".Va tabstops" -compact .It Va radix Input and output radix. .It Va maxoff Addresses are printed as .Dq Ar symbol Ns Li + Ns Ar offset unless .Ar offset is greater than .Va maxoff . .It Va maxwidth The width of the displayed line. .It Va lines The number of lines. It is used by the built-in pager. .It Va tabstops Tab stop width. .It Va work Ns Ar xx Work variable; .Ar xx can take values from 0 to 31. .El .Sh EXPRESSIONS Most expression operators in C are supported except .Ql ~ , .Ql ^ , and unary .Ql & . Special rules in .Nm are: .Bl -tag -width ".No Identifiers" .It Identifiers The name of a symbol is translated to the value of the symbol, which is the address of the corresponding object. .Ql \&. and .Ql \&: can be used in the identifier. If supported by an object format dependent routine, .Sm off .Oo Ar filename : Oc Ar func : lineno , .Sm on .Oo Ar filename : Oc Ns Ar variable , and .Oo Ar filename : Oc Ns Ar lineno can be accepted as a symbol. .It Numbers Radix is determined by the first two letters: .Ql 0x : hex, .Ql 0o : octal, .Ql 0t : decimal; otherwise, follow current radix. .It Li \&. .Va dot .It Li + .Va next .It Li .. address of the start of the last line examined. Unlike .Va dot or .Va next , this is only changed by .Ic examine or .Ic write command. .It Li ' last address explicitly specified. .It Li $ Ns Ar variable Translated to the value of the specified variable. It may be followed by a .Ql \&: and modifiers as described above. .It Ar a Ns Li # Ns Ar b A binary operator which rounds up the left hand side to the next multiple of right hand side. .It Li * Ns Ar expr Indirection. It may be followed by a .Ql \&: and modifiers as described above. .El .Sh SCRIPTING .Nm supports a basic scripting facility to allow automating tasks or responses to specific events. Each script consists of a list of DDB commands to be executed sequentially, and is assigned a unique name. Certain script names have special meaning, and will be automatically run on various .Nm events if scripts by those names have been defined. .Pp The .Ic script command may be used to define a script by name. Scripts consist of a series of .Nm commands separated with the .Ic ; character. For example: .Bd -literal -offset indent script kdb.enter.panic=bt; show pcpu script lockinfo=show alllocks; show lockedvnods .Ed .Pp The .Ic scripts command lists currently defined scripts. .Pp The .Ic run command execute a script by name. For example: .Bd -literal -offset indent run lockinfo .Ed .Pp The .Ic unscript command may be used to delete a script by name. For example: .Bd -literal -offset indent unscript kdb.enter.panic .Ed .Pp These functions may also be performed from userspace using the .Xr ddb 8 command. .Pp Certain scripts are run automatically, if defined, for specific .Nm events. The follow scripts are run when various events occur: .Bl -tag -width kdb.enter.powerfail .It Dv kdb.enter.acpi The kernel debugger was entered as a result of an .Xr acpi 4 event. .It Dv kdb.enter.bootflags The kernel debugger was entered at boot as a result of the debugger boot flag being set. .It Dv kdb.enter.break The kernel debugger was entered as a result of a serial or console break. .It Dv kdb.enter.cam The kernel debugger was entered as a result of a .Xr CAM 4 event. .It Dv kdb.enter.mac The kernel debugger was entered as a result of an assertion failure in the .Xr mac_test 4 module of the TrustedBSD MAC Framework. .It Dv kdb.enter.ndis The kernel debugger was entered as a result of an .Xr ndis 4 breakpoint event. .It Dv kdb.enter.netgraph The kernel debugger was entered as a result of a .Xr netgraph 4 event. .It Dv kdb.enter.panic .Xr panic 9 was called. .It Dv kdb.enter.powerfail The kernel debugger was entered as a result of a powerfail NMI on the sparc64 platform. .It Dv kdb.enter.powerpc The kernel debugger was entered as a result of an unimplemented interrupt type on the powerpc platform. .It Dv kdb.enter.sysctl The kernel debugger was entered as a result of the .Dv debug.kdb.enter sysctl being set. .It Dv kdb.enter.trapsig The kernel debugger was entered as a result of a trapsig event on the sparc64 or sun4v platform. .It Dv kdb.enter.unionfs The kernel debugger was entered as a result of an assertion failure in the union file system. .It Dv kdb.enter.unknown The kernel debugger was entered, but no reason has been set. .It Dv kdb.enter.vfslock The kernel debugger was entered as a result of a VFS lock violation. .It Dv kdb.enter.watchdog The kernel debugger was entered as a result of a watchdog firing. .It Dv kdb.enter.witness The kernel debugger was entered as a result of a .Xr witness 4 violation. .El .Pp In the event that none of these scripts is found, .Nm will attempt to execute a default script: .Bl -tag -width kdb.enter.powerfail .It Dv kdb.enter.default The kernel debugger was entered, but a script exactly matching the reason for entering was not defined. This can be used as a catch-all to handle cases not specifically of interest; for example, .Dv kdb.enter.witness might be defined to have special handling, and .Dv kdb.enter.default might be defined to simply panic and reboot. .El .Sh HINTS On machines with an ISA expansion bus, a simple NMI generation card can be constructed by connecting a push button between the A01 and B01 (CHCHK# and GND) card fingers. Momentarily shorting these two fingers together may cause the bridge chipset to generate an NMI, which causes the kernel to pass control to .Nm . Some bridge chipsets do not generate a NMI on CHCHK#, so your mileage may vary. The NMI allows one to break into the debugger on a wedged machine to diagnose problems. Other bus' bridge chipsets may be able to generate NMI using bus specific methods. .Sh FILES Header files mention in this manual page can be found below .Pa /usr/include directory. .Pp .Bl -dash -compact .It .Pa sys/buf.h .It .Pa sys/domain.h .It .Pa netinet/in_pcb.h .It .Pa sys/socket.h .It .Pa sys/vnode.h .El .Sh SEE ALSO .Xr gdb 1 , .Xr kgdb 1 , .Xr acpi 4 , .Xr CAM 4 , .Xr mac_test 4 , .Xr ndis 4 , .Xr netgraph 4 , .Xr textdump 4 , .Xr witness 4 , .Xr ddb 8 , .Xr sysctl 8 , .Xr panic 9 , .Xr witness 9 .Sh HISTORY The .Nm debugger was developed for Mach, and ported to .Bx 386 0.1 . This manual page translated from .Xr man 7 macros by .An Garrett Wollman . .Pp .An Robert N. M. Watson added support for .Nm output capture, .Xr textdump 4 and scripting in .Fx 8.0 . Index: head/sys/kern/kern_cpuset.c =================================================================== --- head/sys/kern/kern_cpuset.c (revision 180357) +++ head/sys/kern/kern_cpuset.c (revision 180358) @@ -1,977 +1,1010 @@ /*- * Copyright (c) 2008, Jeffrey Roberson * All rights reserved. * * Copyright (c) 2008 Nokia Corporation * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * */ #include __FBSDID("$FreeBSD$"); +#include "opt_ddb.h" + #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#ifdef DDB +#include +#endif /* DDB */ + /* * cpusets provide a mechanism for creating and manipulating sets of * processors for the purpose of constraining the scheduling of threads to * specific processors. * * Each process belongs to an identified set, by default this is set 1. Each * thread may further restrict the cpus it may run on to a subset of this * named set. This creates an anonymous set which other threads and processes * may not join by number. * * The named set is referred to herein as the 'base' set to avoid ambiguity. * This set is usually a child of a 'root' set while the anonymous set may * simply be referred to as a mask. In the syscall api these are referred to * as the ROOT, CPUSET, and MASK levels where CPUSET is called 'base' here. * * Threads inherit their set from their creator whether it be anonymous or * not. This means that anonymous sets are immutable because they may be * shared. To modify an anonymous set a new set is created with the desired * mask and the same parent as the existing anonymous set. This gives the * illusion of each thread having a private mask.A * * Via the syscall apis a user may ask to retrieve or modify the root, base, * or mask that is discovered via a pid, tid, or setid. Modifying a set * modifies all numbered and anonymous child sets to comply with the new mask. * Modifying a pid or tid's mask applies only to that tid but must still * exist within the assigned parent set. * * A thread may not be assigned to a a group seperate from other threads in * the process. This is to remove ambiguity when the setid is queried with * a pid argument. There is no other technical limitation. * * This somewhat complex arrangement is intended to make it easy for * applications to query available processors and bind their threads to * specific processors while also allowing administrators to dynamically * reprovision by changing sets which apply to groups of processes. * * A simple application should not concern itself with sets at all and * rather apply masks to its own threads via CPU_WHICH_TID and a -1 id * meaning 'curthread'. It may query availble cpus for that tid with a * getaffinity call using (CPU_LEVEL_CPUSET, CPU_WHICH_PID, -1, ...). */ static uma_zone_t cpuset_zone; static struct mtx cpuset_lock; static struct setlist cpuset_ids; static struct unrhdr *cpuset_unr; static struct cpuset *cpuset_zero; cpuset_t *cpuset_root; /* * Acquire a reference to a cpuset, all pointers must be tracked with refs. */ struct cpuset * cpuset_ref(struct cpuset *set) { refcount_acquire(&set->cs_ref); return (set); } /* * Walks up the tree from 'set' to find the root. Returns the root * referenced. */ static struct cpuset * cpuset_refroot(struct cpuset *set) { for (; set->cs_parent != NULL; set = set->cs_parent) if (set->cs_flags & CPU_SET_ROOT) break; cpuset_ref(set); return (set); } /* * Find the first non-anonymous set starting from 'set'. Returns this set * referenced. May return the passed in set with an extra ref if it is * not anonymous. */ static struct cpuset * cpuset_refbase(struct cpuset *set) { if (set->cs_id == CPUSET_INVALID) set = set->cs_parent; cpuset_ref(set); return (set); } /* * Release a reference in a context where it is safe to allocte. */ void cpuset_rel(struct cpuset *set) { cpusetid_t id; if (refcount_release(&set->cs_ref) == 0) return; mtx_lock_spin(&cpuset_lock); LIST_REMOVE(set, cs_siblings); id = set->cs_id; if (id != CPUSET_INVALID) LIST_REMOVE(set, cs_link); mtx_unlock_spin(&cpuset_lock); cpuset_rel(set->cs_parent); uma_zfree(cpuset_zone, set); if (id != CPUSET_INVALID) free_unr(cpuset_unr, id); } /* * Deferred release must be used when in a context that is not safe to * allocate/free. This places any unreferenced sets on the list 'head'. */ static void cpuset_rel_defer(struct setlist *head, struct cpuset *set) { if (refcount_release(&set->cs_ref) == 0) return; mtx_lock_spin(&cpuset_lock); LIST_REMOVE(set, cs_siblings); if (set->cs_id != CPUSET_INVALID) LIST_REMOVE(set, cs_link); LIST_INSERT_HEAD(head, set, cs_link); mtx_unlock_spin(&cpuset_lock); } /* * Complete a deferred release. Removes the set from the list provided to * cpuset_rel_defer. */ static void cpuset_rel_complete(struct cpuset *set) { LIST_REMOVE(set, cs_link); cpuset_rel(set->cs_parent); uma_zfree(cpuset_zone, set); } /* * Find a set based on an id. Returns it with a ref. */ static struct cpuset * cpuset_lookup(cpusetid_t setid) { struct cpuset *set; if (setid == CPUSET_INVALID) return (NULL); mtx_lock_spin(&cpuset_lock); LIST_FOREACH(set, &cpuset_ids, cs_link) if (set->cs_id == setid) break; if (set) cpuset_ref(set); mtx_unlock_spin(&cpuset_lock); return (set); } /* * Create a set in the space provided in 'set' with the provided parameters. * The set is returned with a single ref. May return EDEADLK if the set * will have no valid cpu based on restrictions from the parent. */ static int _cpuset_create(struct cpuset *set, struct cpuset *parent, cpuset_t *mask, cpusetid_t id) { if (!CPU_OVERLAP(&parent->cs_mask, mask)) return (EDEADLK); CPU_COPY(mask, &set->cs_mask); LIST_INIT(&set->cs_children); refcount_init(&set->cs_ref, 1); set->cs_flags = 0; mtx_lock_spin(&cpuset_lock); CPU_AND(mask, &parent->cs_mask); set->cs_id = id; set->cs_parent = cpuset_ref(parent); LIST_INSERT_HEAD(&parent->cs_children, set, cs_siblings); if (set->cs_id != CPUSET_INVALID) LIST_INSERT_HEAD(&cpuset_ids, set, cs_link); mtx_unlock_spin(&cpuset_lock); return (0); } /* * Create a new non-anonymous set with the requested parent and mask. May * return failures if the mask is invalid or a new number can not be * allocated. */ static int cpuset_create(struct cpuset **setp, struct cpuset *parent, cpuset_t *mask) { struct cpuset *set; cpusetid_t id; int error; id = alloc_unr(cpuset_unr); if (id == -1) return (ENFILE); *setp = set = uma_zalloc(cpuset_zone, M_WAITOK); error = _cpuset_create(set, parent, mask, id); if (error == 0) return (0); free_unr(cpuset_unr, id); uma_zfree(cpuset_zone, set); return (error); } /* * Recursively check for errors that would occur from applying mask to * the tree of sets starting at 'set'. Checks for sets that would become * empty as well as RDONLY flags. */ static int cpuset_testupdate(struct cpuset *set, cpuset_t *mask) { struct cpuset *nset; cpuset_t newmask; int error; mtx_assert(&cpuset_lock, MA_OWNED); if (set->cs_flags & CPU_SET_RDONLY) return (EPERM); if (!CPU_OVERLAP(&set->cs_mask, mask)) return (EDEADLK); CPU_COPY(&set->cs_mask, &newmask); CPU_AND(&newmask, mask); error = 0; LIST_FOREACH(nset, &set->cs_children, cs_siblings) if ((error = cpuset_testupdate(nset, &newmask)) != 0) break; return (error); } /* * Applies the mask 'mask' without checking for empty sets or permissions. */ static void cpuset_update(struct cpuset *set, cpuset_t *mask) { struct cpuset *nset; mtx_assert(&cpuset_lock, MA_OWNED); CPU_AND(&set->cs_mask, mask); LIST_FOREACH(nset, &set->cs_children, cs_siblings) cpuset_update(nset, &set->cs_mask); return; } /* * Modify the set 'set' to use a copy of the mask provided. Apply this new * mask to restrict all children in the tree. Checks for validity before * applying the changes. */ static int cpuset_modify(struct cpuset *set, cpuset_t *mask) { struct cpuset *root; int error; error = priv_check(curthread, PRIV_SCHED_CPUSET); if (error) return (error); /* * Verify that we have access to this set of * cpus. */ root = set->cs_parent; if (root && !CPU_SUBSET(&root->cs_mask, mask)) return (EINVAL); mtx_lock_spin(&cpuset_lock); error = cpuset_testupdate(set, mask); if (error) goto out; cpuset_update(set, mask); CPU_COPY(mask, &set->cs_mask); out: mtx_unlock_spin(&cpuset_lock); return (error); } /* * Resolve the 'which' parameter of several cpuset apis. * * For WHICH_PID and WHICH_TID return a locked proc and valid proc/tid. Also * checks for permission via p_cansched(). * * For WHICH_SET returns a valid set with a new reference. * * -1 may be supplied for any argument to mean the current proc/thread or * the base set of the current thread. May fail with ESRCH/EPERM. */ static int cpuset_which(cpuwhich_t which, id_t id, struct proc **pp, struct thread **tdp, struct cpuset **setp) { struct cpuset *set; struct thread *td; struct proc *p; int error; *pp = p = NULL; *tdp = td = NULL; *setp = set = NULL; switch (which) { case CPU_WHICH_PID: if (id == -1) { PROC_LOCK(curproc); p = curproc; break; } if ((p = pfind(id)) == NULL) return (ESRCH); break; case CPU_WHICH_TID: if (id == -1) { PROC_LOCK(curproc); p = curproc; td = curthread; break; } sx_slock(&allproc_lock); FOREACH_PROC_IN_SYSTEM(p) { PROC_LOCK(p); FOREACH_THREAD_IN_PROC(p, td) if (td->td_tid == id) break; if (td != NULL) break; PROC_UNLOCK(p); } sx_sunlock(&allproc_lock); if (td == NULL) return (ESRCH); break; case CPU_WHICH_CPUSET: if (id == -1) { thread_lock(curthread); set = cpuset_refbase(curthread->td_cpuset); thread_unlock(curthread); } else set = cpuset_lookup(id); if (set) { *setp = set; return (0); } return (ESRCH); case CPU_WHICH_IRQ: return (0); default: return (EINVAL); } error = p_cansched(curthread, p); if (error) { PROC_UNLOCK(p); return (error); } if (td == NULL) td = FIRST_THREAD_IN_PROC(p); *pp = p; *tdp = td; return (0); } /* * Create an anonymous set with the provided mask in the space provided by * 'fset'. If the passed in set is anonymous we use its parent otherwise * the new set is a child of 'set'. */ static int cpuset_shadow(struct cpuset *set, struct cpuset *fset, cpuset_t *mask) { struct cpuset *parent; if (set->cs_id == CPUSET_INVALID) parent = set->cs_parent; else parent = set; if (!CPU_SUBSET(&parent->cs_mask, mask)) return (EDEADLK); return (_cpuset_create(fset, parent, mask, CPUSET_INVALID)); } /* * Handle two cases for replacing the base set or mask of an entire process. * * 1) Set is non-null and mask is null. This reparents all anonymous sets * to the provided set and replaces all non-anonymous td_cpusets with the * provided set. * 2) Mask is non-null and set is null. This replaces or creates anonymous * sets for every thread with the existing base as a parent. * * This is overly complicated because we can't allocate while holding a * spinlock and spinlocks must be held while changing and examining thread * state. */ static int cpuset_setproc(pid_t pid, struct cpuset *set, cpuset_t *mask) { struct setlist freelist; struct setlist droplist; struct cpuset *tdset; struct cpuset *nset; struct thread *td; struct proc *p; int threads; int nfree; int error; /* * The algorithm requires two passes due to locking considerations. * * 1) Lookup the process and acquire the locks in the required order. * 2) If enough cpusets have not been allocated release the locks and * allocate them. Loop. */ LIST_INIT(&freelist); LIST_INIT(&droplist); nfree = 0; for (;;) { error = cpuset_which(CPU_WHICH_PID, pid, &p, &td, &nset); if (error) goto out; if (nfree >= p->p_numthreads) break; threads = p->p_numthreads; PROC_UNLOCK(p); for (; nfree < threads; nfree++) { nset = uma_zalloc(cpuset_zone, M_WAITOK); LIST_INSERT_HEAD(&freelist, nset, cs_link); } } PROC_LOCK_ASSERT(p, MA_OWNED); /* * Now that the appropriate locks are held and we have enough cpusets, * make sure the operation will succeed before applying changes. The * proc lock prevents td_cpuset from changing between calls. */ error = 0; FOREACH_THREAD_IN_PROC(p, td) { thread_lock(td); tdset = td->td_cpuset; /* * Verify that a new mask doesn't specify cpus outside of * the set the thread is a member of. */ if (mask) { if (tdset->cs_id == CPUSET_INVALID) tdset = tdset->cs_parent; if (!CPU_SUBSET(&tdset->cs_mask, mask)) error = EDEADLK; /* * Verify that a new set won't leave an existing thread * mask without a cpu to run on. It can, however, restrict * the set. */ } else if (tdset->cs_id == CPUSET_INVALID) { if (!CPU_OVERLAP(&set->cs_mask, &tdset->cs_mask)) error = EDEADLK; } thread_unlock(td); if (error) goto unlock_out; } /* * Replace each thread's cpuset while using deferred release. We * must do this because the thread lock must be held while operating * on the thread and this limits the type of operations allowed. */ FOREACH_THREAD_IN_PROC(p, td) { thread_lock(td); /* * If we presently have an anonymous set or are applying a * mask we must create an anonymous shadow set. That is * either parented to our existing base or the supplied set. * * If we have a base set with no anonymous shadow we simply * replace it outright. */ tdset = td->td_cpuset; if (tdset->cs_id == CPUSET_INVALID || mask) { nset = LIST_FIRST(&freelist); LIST_REMOVE(nset, cs_link); if (mask) error = cpuset_shadow(tdset, nset, mask); else error = _cpuset_create(nset, set, &tdset->cs_mask, CPUSET_INVALID); if (error) { LIST_INSERT_HEAD(&freelist, nset, cs_link); thread_unlock(td); break; } } else nset = cpuset_ref(set); cpuset_rel_defer(&droplist, tdset); td->td_cpuset = nset; sched_affinity(td); thread_unlock(td); } unlock_out: PROC_UNLOCK(p); out: while ((nset = LIST_FIRST(&droplist)) != NULL) cpuset_rel_complete(nset); while ((nset = LIST_FIRST(&freelist)) != NULL) { LIST_REMOVE(nset, cs_link); uma_zfree(cpuset_zone, nset); } return (error); } /* * Apply an anonymous mask to a single thread. */ int cpuset_setthread(lwpid_t id, cpuset_t *mask) { struct cpuset *nset; struct cpuset *set; struct thread *td; struct proc *p; int error; nset = uma_zalloc(cpuset_zone, M_WAITOK); error = cpuset_which(CPU_WHICH_TID, id, &p, &td, &set); if (error) goto out; set = NULL; thread_lock(td); error = cpuset_shadow(td->td_cpuset, nset, mask); if (error == 0) { set = td->td_cpuset; td->td_cpuset = nset; sched_affinity(td); nset = NULL; } thread_unlock(td); PROC_UNLOCK(p); if (set) cpuset_rel(set); out: if (nset) uma_zfree(cpuset_zone, nset); return (error); } /* * Creates the cpuset for thread0. We make two sets: * * 0 - The root set which should represent all valid processors in the * system. It is initially created with a mask of all processors * because we don't know what processors are valid until cpuset_init() * runs. This set is immutable. * 1 - The default set which all processes are a member of until changed. * This allows an administrator to move all threads off of given cpus to * dedicate them to high priority tasks or save power etc. */ struct cpuset * cpuset_thread0(void) { struct cpuset *set; int error; cpuset_zone = uma_zcreate("cpuset", sizeof(struct cpuset), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0); mtx_init(&cpuset_lock, "cpuset", NULL, MTX_SPIN | MTX_RECURSE); /* * Create the root system set for the whole machine. Doesn't use * cpuset_create() due to NULL parent. */ set = uma_zalloc(cpuset_zone, M_WAITOK | M_ZERO); set->cs_mask.__bits[0] = -1; LIST_INIT(&set->cs_children); LIST_INSERT_HEAD(&cpuset_ids, set, cs_link); set->cs_ref = 1; set->cs_flags = CPU_SET_ROOT; cpuset_zero = set; cpuset_root = &set->cs_mask; /* * Now derive a default, modifiable set from that to give out. */ set = uma_zalloc(cpuset_zone, M_WAITOK); error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, 1); KASSERT(error == 0, ("Error creating default set: %d\n", error)); /* * Initialize the unit allocator. 0 and 1 are allocated above. */ cpuset_unr = new_unrhdr(2, INT_MAX, NULL); return (set); } /* * This is called once the final set of system cpus is known. Modifies * the root set and all children and mark the root readonly. */ static void cpuset_init(void *arg) { cpuset_t mask; CPU_ZERO(&mask); #ifdef SMP mask.__bits[0] = all_cpus; #else mask.__bits[0] = 1; #endif if (cpuset_modify(cpuset_zero, &mask)) panic("Can't set initial cpuset mask.\n"); cpuset_zero->cs_flags |= CPU_SET_RDONLY; } SYSINIT(cpuset, SI_SUB_SMP, SI_ORDER_ANY, cpuset_init, NULL); #ifndef _SYS_SYSPROTO_H_ struct cpuset_args { cpusetid_t *setid; }; #endif int cpuset(struct thread *td, struct cpuset_args *uap) { struct cpuset *root; struct cpuset *set; int error; thread_lock(td); root = cpuset_refroot(td->td_cpuset); thread_unlock(td); error = cpuset_create(&set, root, &root->cs_mask); cpuset_rel(root); if (error) return (error); error = copyout(&set->cs_id, uap->setid, sizeof(set->cs_id)); if (error == 0) error = cpuset_setproc(-1, set, NULL); cpuset_rel(set); return (error); } #ifndef _SYS_SYSPROTO_H_ struct cpuset_setid_args { cpuwhich_t which; id_t id; cpusetid_t setid; }; #endif int cpuset_setid(struct thread *td, struct cpuset_setid_args *uap) { struct cpuset *set; int error; /* * Presently we only support per-process sets. */ if (uap->which != CPU_WHICH_PID) return (EINVAL); set = cpuset_lookup(uap->setid); if (set == NULL) return (ESRCH); error = cpuset_setproc(uap->id, set, NULL); cpuset_rel(set); return (error); } #ifndef _SYS_SYSPROTO_H_ struct cpuset_getid_args { cpulevel_t level; cpuwhich_t which; id_t id; cpusetid_t *setid; #endif int cpuset_getid(struct thread *td, struct cpuset_getid_args *uap) { struct cpuset *nset; struct cpuset *set; struct thread *ttd; struct proc *p; cpusetid_t id; int error; if (uap->level == CPU_LEVEL_WHICH && uap->which != CPU_WHICH_CPUSET) return (EINVAL); error = cpuset_which(uap->which, uap->id, &p, &ttd, &set); if (error) return (error); switch (uap->which) { case CPU_WHICH_TID: case CPU_WHICH_PID: thread_lock(ttd); set = cpuset_refbase(ttd->td_cpuset); thread_unlock(ttd); PROC_UNLOCK(p); break; case CPU_WHICH_CPUSET: break; case CPU_WHICH_IRQ: return (EINVAL); } switch (uap->level) { case CPU_LEVEL_ROOT: nset = cpuset_refroot(set); cpuset_rel(set); set = nset; break; case CPU_LEVEL_CPUSET: break; case CPU_LEVEL_WHICH: break; } id = set->cs_id; cpuset_rel(set); if (error == 0) error = copyout(&id, uap->setid, sizeof(id)); return (error); } #ifndef _SYS_SYSPROTO_H_ struct cpuset_getaffinity_args { cpulevel_t level; cpuwhich_t which; id_t id; size_t cpusetsize; cpuset_t *mask; }; #endif int cpuset_getaffinity(struct thread *td, struct cpuset_getaffinity_args *uap) { struct thread *ttd; struct cpuset *nset; struct cpuset *set; struct proc *p; cpuset_t *mask; int error; size_t size; if (uap->cpusetsize < sizeof(cpuset_t) || uap->cpusetsize > CPU_MAXSIZE / NBBY) return (ERANGE); size = uap->cpusetsize; mask = malloc(size, M_TEMP, M_WAITOK | M_ZERO); error = cpuset_which(uap->which, uap->id, &p, &ttd, &set); if (error) goto out; switch (uap->level) { case CPU_LEVEL_ROOT: case CPU_LEVEL_CPUSET: switch (uap->which) { case CPU_WHICH_TID: case CPU_WHICH_PID: thread_lock(ttd); set = cpuset_ref(ttd->td_cpuset); thread_unlock(ttd); break; case CPU_WHICH_CPUSET: break; case CPU_WHICH_IRQ: error = EINVAL; goto out; } if (uap->level == CPU_LEVEL_ROOT) nset = cpuset_refroot(set); else nset = cpuset_refbase(set); CPU_COPY(&nset->cs_mask, mask); cpuset_rel(nset); break; case CPU_LEVEL_WHICH: switch (uap->which) { case CPU_WHICH_TID: thread_lock(ttd); CPU_COPY(&ttd->td_cpuset->cs_mask, mask); thread_unlock(ttd); break; case CPU_WHICH_PID: FOREACH_THREAD_IN_PROC(p, ttd) { thread_lock(ttd); CPU_OR(mask, &ttd->td_cpuset->cs_mask); thread_unlock(ttd); } break; case CPU_WHICH_CPUSET: CPU_COPY(&set->cs_mask, mask); break; case CPU_WHICH_IRQ: error = intr_getaffinity(uap->id, mask); break; } break; default: error = EINVAL; break; } if (set) cpuset_rel(set); if (p) PROC_UNLOCK(p); if (error == 0) error = copyout(mask, uap->mask, size); out: free(mask, M_TEMP); return (error); } #ifndef _SYS_SYSPROTO_H_ struct cpuset_setaffinity_args { cpulevel_t level; cpuwhich_t which; id_t id; size_t cpusetsize; const cpuset_t *mask; }; #endif int cpuset_setaffinity(struct thread *td, struct cpuset_setaffinity_args *uap) { struct cpuset *nset; struct cpuset *set; struct thread *ttd; struct proc *p; cpuset_t *mask; int error; if (uap->cpusetsize < sizeof(cpuset_t) || uap->cpusetsize > CPU_MAXSIZE / NBBY) return (ERANGE); mask = malloc(uap->cpusetsize, M_TEMP, M_WAITOK | M_ZERO); error = copyin(uap->mask, mask, uap->cpusetsize); if (error) goto out; /* * Verify that no high bits are set. */ if (uap->cpusetsize > sizeof(cpuset_t)) { char *end; char *cp; end = cp = (char *)&mask->__bits; end += uap->cpusetsize; cp += sizeof(cpuset_t); while (cp != end) if (*cp++ != 0) { error = EINVAL; goto out; } } switch (uap->level) { case CPU_LEVEL_ROOT: case CPU_LEVEL_CPUSET: error = cpuset_which(uap->which, uap->id, &p, &ttd, &set); if (error) break; switch (uap->which) { case CPU_WHICH_TID: case CPU_WHICH_PID: thread_lock(ttd); set = cpuset_ref(ttd->td_cpuset); thread_unlock(ttd); PROC_UNLOCK(p); break; case CPU_WHICH_CPUSET: break; case CPU_WHICH_IRQ: error = EINVAL; goto out; } if (uap->level == CPU_LEVEL_ROOT) nset = cpuset_refroot(set); else nset = cpuset_refbase(set); error = cpuset_modify(nset, mask); cpuset_rel(nset); cpuset_rel(set); break; case CPU_LEVEL_WHICH: switch (uap->which) { case CPU_WHICH_TID: error = cpuset_setthread(uap->id, mask); break; case CPU_WHICH_PID: error = cpuset_setproc(uap->id, NULL, mask); break; case CPU_WHICH_CPUSET: error = cpuset_which(CPU_WHICH_CPUSET, uap->id, &p, &ttd, &set); if (error == 0) { error = cpuset_modify(set, mask); cpuset_rel(set); } break; case CPU_WHICH_IRQ: error = intr_setaffinity(uap->id, mask); break; default: error = EINVAL; break; } break; default: error = EINVAL; break; } out: free(mask, M_TEMP); return (error); } + +#ifdef DDB +DB_SHOW_COMMAND(cpusets, db_show_cpusets) +{ + struct cpuset *set; + int cpu, once; + + LIST_FOREACH(set, &cpuset_ids, cs_link) { + db_printf("set=%p id=%-6u ref=%-6d flags=0x%04x parent id=%d\n", + set, set->cs_id, set->cs_ref, set->cs_flags, + (set->cs_parent != NULL) ? set->cs_parent->cs_id : 0); + db_printf(" mask="); + for (once = 0, cpu = 0; cpu < CPU_SETSIZE; cpu++) { + if (CPU_ISSET(cpu, &set->cs_mask)) { + if (once == 0) { + db_printf("%d", cpu); + once = 1; + } else + db_printf(",%d", cpu); + } + } + db_printf("\n"); + if (db_pager_quit) + break; + } +} +#endif /* DDB */