diff --git a/lib/libpmc/pmc.3 b/lib/libpmc/pmc.3 index 3325cb1e3d40..abe9f3208030 100644 --- a/lib/libpmc/pmc.3 +++ b/lib/libpmc/pmc.3 @@ -1,527 +1,531 @@ .\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" -.Dd August 10, 2021 +.Dd May 28, 2022 .Dt PMC 3 .Os .Sh NAME .Nm pmc .Nd library for accessing hardware performance monitoring counters .Sh LIBRARY .Lb libpmc .Sh SYNOPSIS .In pmc.h .Sh DESCRIPTION The .Lb libpmc provides a programming interface that allows applications to use hardware performance counters to gather performance data about specific processes or for the system as a whole. The library is implemented using the lower-level facilities offered by the .Xr hwpmc 4 driver. .Ss Key Concepts Performance monitoring counters (PMCs) are represented by the library using a software abstraction. These .Dq abstract PMCs can have two scopes: .Bl -bullet .It System scope. These PMCs measure events in a whole-system manner, i.e., independent of the currently executing thread. System scope PMCs are allocated on specific CPUs and do not migrate between CPUs. Non-privileged process are allowed to allocate system scope PMCs if the .Xr hwpmc 4 sysctl tunable: .Va security.bsd.unprivileged_syspmcs is non-zero. .It Process scope. These PMCs only measure hardware events when the processes they are attached to are executing on a CPU. In an SMP system, process scope PMCs migrate between CPUs along with their target processes. .El .Pp Orthogonal to PMC scope, PMCs may be allocated in one of two operational modes: .Bl -bullet .It Counting PMCs measure events according to their scope (system or process). The application needs to explicitly read these counters to retrieve their value. .It Sampling PMCs cause the CPU to be periodically interrupted and information about its state of execution to be collected. Sampling PMCs are used to profile specific processes and kernel threads or to profile the system as a whole. .El .Pp The scope and operational mode for a software PMC are specified at PMC allocation time. An application is allowed to allocate multiple PMCs subject to availability of hardware resources. .Pp The library uses human-readable strings to name the event being measured by hardware. The syntax used for specifying a hardware event along with additional event specific qualifiers (if any) is described in detail in section .Sx "EVENT SPECIFIERS" below. .Pp PMCs are associated with the process that allocated them and will be automatically reclaimed by the system when the process exits. Additionally, process-scope PMCs have to be attached to one or more target processes before they can perform measurements. A process-scope PMC may be attached to those target processes that its owner process would otherwise be permitted to debug. An owner process may attach PMCs to itself allowing it to measure its own behavior. Additionally, on some machine architectures, such self-attached PMCs may be read cheaply using specialized instructions supported by the processor. .Pp Certain kinds of PMCs require that a log file be configured before they may be started. These include: .Bl -bullet .It System scope sampling PMCs. .It Process scope sampling PMCs. .It Process scope counting PMCs that have been configured to report PMC readings on process context switches or process exits. .El .Pp Up to one log file may be configured per owner process. Events logged to a log file may be subsequently analyzed using the .Xr pmclog 3 family of functions. .Ss Supported CPUs The CPUs known to the PMC library are named by the .Vt "enum pmc_cputype" enumeration. Supported CPUs include: .Pp .Bl -tag -width "Li PMC_CPU_INTEL_CORE2" -compact .It Li PMC_CPU_AMD_K7 .Tn "AMD Athlon" CPUs. .It Li PMC_CPU_AMD_K8 .Tn "AMD Athlon64" CPUs. .It Li PMC_CPU_INTEL_ATOM .Tn Intel .Tn Atom CPUs and other CPUs conforming to version 3 of the .Tn Intel performance measurement architecture. .It Li PMC_CPU_INTEL_CORE .Tn Intel .Tn Core Solo and .Tn Core Duo CPUs, and other CPUs conforming to version 1 of the .Tn Intel performance measurement architecture. .It Li PMC_CPU_INTEL_CORE2 .Tn Intel .Tn "Core2 Solo" , .Tn "Core2 Duo" and .Tn "Core2 Extreme" CPUs, and other CPUs conforming to version 2 of the .Tn Intel performance measurement architecture. .El .Ss Supported PMCs PMC supported by this library are named by the .Vt enum pmc_class enumeration. Supported PMC kinds include: .Pp .Bl -tag -width "Li PMC_CLASS_IAF" -compact .It Li PMC_CLASS_IAF Fixed function hardware counters presents in CPUs conforming to the .Tn Intel performance measurement architecture version 2 and later. .It Li PMC_CLASS_IAP Programmable hardware counters present in CPUs conforming to the .Tn Intel performance measurement architecture version 1 and later. .It Li PMC_CLASS_K7 Programmable hardware counters present in .Tn "AMD Athlon" CPUs. .It Li PMC_CLASS_K8 Programmable hardware counters present in .Tn "AMD Athlon64" CPUs. .It Li PMC_CLASS_TSC The timestamp counter on i386 and amd64 architecture CPUs. .It Li PMC_CLASS_SOFT Software events. .El .Ss PMC Capabilities Capabilities of performance monitoring hardware are denoted using the .Vt "enum pmc_caps" enumeration. Supported capabilities include: .Pp .Bl -tag -width "Li PMC_CAP_INTERRUPT" -compact .It Li PMC_CAP_CASCADE The ability to cascade counters. +.It Li PMC_CAP_DOMWIDE +Separate counters tied to each NUMA domain. .It Li PMC_CAP_EDGE The ability to count negated to asserted transitions of the hardware conditions being probed for. .It Li PMC_CAP_INTERRUPT The ability to interrupt the CPU. .It Li PMC_CAP_INVERT The ability to invert the sense of the hardware conditions being measured. .It Li PMC_CAP_PRECISE The ability to perform precise sampling. .It Li PMC_CAP_QUALIFIER The hardware allows monitored to be further qualified in some system dependent way. .It Li PMC_CAP_READ The ability to read from performance counters. .It Li PMC_CAP_SYSTEM The ability to restrict counting of hardware events to when the CPU is running privileged code. +.It Li PMC_CAP_SYSWIDE +A single counter aggregating events for the whole system. .It Li PMC_CAP_THRESHOLD The ability to ignore simultaneous hardware events below a programmable threshold. .It Li PMC_CAP_USER The ability to restrict counting of hardware events to those when the CPU is running unprivileged code. .It Li PMC_CAP_WRITE The ability to write to performance counters. .El .Ss CPU Naming Conventions CPUs are named using small integers from zero up to, but excluding, the value returned by function .Fn pmc_ncpu . On platforms supporting sparsely numbered CPUs not all the numbers in this range will denote valid CPUs. Operations on non-existent CPUs will return an error. .Ss Functional Grouping of the API This section contains a brief overview of the available functionality in the PMC library. Each function listed here is described further in its own manual page. .Bl -tag -width 2n .It Administration .Bl -tag -width 6n -compact .It Fn pmc_disable , Fn pmc_enable Administratively disable (enable) specific performance monitoring counter hardware. Counters that are disabled will not be available to applications to use. .El .It "Convenience Functions" .Bl -tag -width 6n -compact .It Fn pmc_event_names_of_class Returns a list of event names supported by a given PMC type. .It Fn pmc_name_of_capability Convert a .Dv PMC_CAP_* flag to a human-readable string. .It Fn pmc_name_of_class Convert a .Dv PMC_CLASS_* constant to a human-readable string. .It Fn pmc_name_of_cputype Return a human-readable name for a CPU type. .It Fn pmc_name_of_disposition Return a human-readable string describing a PMC's disposition. .It Fn pmc_name_of_event Convert a numeric event code to a human-readable string. .It Fn pmc_name_of_mode Convert a .Dv PMC_MODE_* constant to a human-readable name. .It Fn pmc_name_of_state Return a human-readable string describing a PMC's current state. .El .It "Library Initialization" .Bl -tag -width 6n -compact .It Fn pmc_init Initialize the library. This function must be called before any other library function. .El .It "Log File Handling" .Bl -tag -width 6n -compact .It Fn pmc_configure_logfile Configure a log file for .Xr hwpmc 4 to write logged events to. .It Fn pmc_flush_logfile Flush all pending log data in .Xr hwpmc 4 Ns Ap s buffers. .It Fn pmc_close_logfile Flush all pending log data and close .Xr hwpmc 4 Ns Ap s side of the stream. .It Fn pmc_writelog Append arbitrary user data to the current log file. .El .It "PMC Management" .Bl -tag -width 6n -compact .It Fn pmc_allocate , Fn pmc_release Allocate (free) a PMC. .It Fn pmc_attach , Fn pmc_detach Attach (detach) a process scope PMC to a target. .It Fn pmc_read , Fn pmc_write , Fn pmc_rw Read (write) a value from (to) a PMC. .It Fn pmc_start , Fn pmc_stop Start (stop) a software PMC. .It Fn pmc_set Set the reload value for a sampling PMC. .El .It "Queries" .Bl -tag -width 6n -compact .It Fn pmc_capabilities Retrieve the capabilities for a given PMC. .It Fn pmc_cpuinfo Retrieve information about the CPUs and PMC hardware present in the system. .It Fn pmc_get_driver_stats Retrieve statistics maintained by .Xr hwpmc 4 . .It Fn pmc_ncpu Determine the greatest possible CPU number on the system. .It Fn pmc_npmc Return the number of hardware PMCs present in a given CPU. .It Fn pmc_pmcinfo Return information about the state of a given CPU's PMCs. .It Fn pmc_width Determine the width of a hardware counter in bits. .El .It "x86 Architecture Specific API" .Bl -tag -width 6n -compact .It Fn pmc_get_msr Returns the processor model specific register number associated with .Fa pmc . Applications may then use the x86 .Ic RDPMC instruction to directly read the contents of the PMC. .El .El .Ss Signal Handling Requirements Applications using PMCs are required to handle the following signals: .Bl -tag -width ".Dv SIGBUS" .It Dv SIGBUS When the .Xr hwpmc 4 module is unloaded using .Xr kldunload 8 , processes that have PMCs allocated to them will be sent a .Dv SIGBUS signal. .It Dv SIGIO The .Xr hwpmc 4 driver will send a PMC owning process a .Dv SIGIO signal if: .Bl -bullet .It If any process-mode PMC allocated by it loses all its target processes. .It If the driver encounters an error when writing log data to a configured log file. This error may be retrieved by a subsequent call to .Fn pmc_flush_logfile . .El .El .Ss Typical Program Flow .Bl -enum .It An application would first invoke function .Fn pmc_init to allow the library to initialize itself. .It Signal handling would then be set up. .It Next the application would allocate the PMCs it desires using function .Fn pmc_allocate . .It Initial values for PMCs may be set using function .Fn pmc_set . .It If a log file is necessary for the PMCs to work, it would be configured using function .Fn pmc_configure_logfile . .It Process scope PMCs would then be attached to their target processes using function .Fn pmc_attach . .It The PMCs would then be started using function .Fn pmc_start . .It Once started, the values of counting PMCs may be read using function .Fn pmc_read . For PMCs that write events to the log file, this logged data would be read and parsed using the .Xr pmclog 3 family of functions. .It PMCs are stopped using function .Fn pmc_stop , and process scope PMCs are detached from their targets using function .Fn pmc_detach . .It Before the process exits, its may release its PMCs using function .Fn pmc_release . Any configured log file may be closed using function .Fn pmc_configure_logfile . .El .Sh EVENT SPECIFIERS Event specifiers are strings comprising of an event name, followed by optional parameters modifying the semantics of the hardware event being probed. Event names are PMC architecture dependent, but the PMC library defines machine independent aliases for commonly used events. .Pp Event specifiers spellings are case-insensitive and space characters, periods, underscores and hyphens are considered equivalent to each other. Thus the event specifiers .Qq "Example Event" , .Qq "example-event" , and .Qq "EXAMPLE_EVENT" are equivalent. .Ss PMC Architecture Dependent Events PMC architecture dependent event specifiers are described in the following manual pages: .Bl -column " PMC_CLASS_TSC " "MANUAL PAGE " .It Em "PMC Class" Ta Em "Manual Page" .It Li PMC_CLASS_IAF Ta Xr pmc.iaf 3 .It Li PMC_CLASS_IAP Ta Xr pmc.atom 3 , Xr pmc.core 3 , Xr pmc.core2 3 .It Li PMC_CLASS_K7 Ta Xr pmc.k7 3 .It Li PMC_CLASS_K8 Ta Xr pmc.k8 3 .It Li PMC_CLASS_TSC Ta Xr pmc.tsc 3 .El .Ss Event Name Aliases Event name aliases are PMC-independent names for commonly used events. The following aliases are known to this version of the .Nm pmc library: .Bl -tag -width indent .It Li branches Measure the number of branches retired. .It Li branch-mispredicts Measure the number of retired branches that were mispredicted. .It Li cycles Measure processor cycles. This event is implemented using the processor's Time Stamp Counter register. .It Li dc-misses Measure the number of data cache misses. .It Li ic-misses Measure the number of instruction cache misses. .It Li instructions Measure the number of instructions retired. .It Li interrupts Measure the number of interrupts seen. .It Li unhalted-cycles Measure the number of cycles the processor is not in a halted or sleep state. .El .Sh COMPATIBILITY The interface between the .Nm pmc library and the .Xr hwpmc 4 driver is intended to be private to the implementation and may change. In order to ease forward compatibility with future versions of the .Xr hwpmc 4 driver, applications are urged to dynamically link with the .Nm pmc library. .Pp The .Nm pmc API is .Ud .Sh SEE ALSO .Xr pmc.atom 3 , .Xr pmc.core 3 , .Xr pmc.core2 3 , .Xr pmc.haswell 3 , .Xr pmc.haswelluc 3 , .Xr pmc.haswellxeon 3 , .Xr pmc.iaf 3 , .Xr pmc.ivybridge 3 , .Xr pmc.ivybridgexeon 3 , .Xr pmc.k7 3 , .Xr pmc.k8 3 , .Xr pmc.sandybridge 3 , .Xr pmc.sandybridgeuc 3 , .Xr pmc.sandybridgexeon 3 , .Xr pmc.soft 3 , .Xr pmc.tsc 3 , .Xr pmc.westmere 3 , .Xr pmc.westmereuc 3 , .Xr pmc_allocate 3 , .Xr pmc_attach 3 , .Xr pmc_capabilities 3 , .Xr pmc_configure_logfile 3 , .Xr pmc_disable 3 , .Xr pmc_event_names_of_class 3 , .Xr pmc_get_driver_stats 3 , .Xr pmc_get_msr 3 , .Xr pmc_init 3 , .Xr pmc_name_of_capability 3 , .Xr pmc_read 3 , .Xr pmc_set 3 , .Xr pmc_start 3 , .Xr pmclog 3 , .Xr hwpmc 4 , .Xr pmccontrol 8 , .Xr pmcstat 8 .Sh HISTORY The .Nm pmc library first appeared in .Fx 6.0 . .Sh AUTHORS The .Lb libpmc library was written by .An Joseph Koshy Aq Mt jkoshy@FreeBSD.org . diff --git a/sys/sys/pmc.h b/sys/sys/pmc.h index 18c38ca36659..5465d0532a2a 100644 --- a/sys/sys/pmc.h +++ b/sys/sys/pmc.h @@ -1,1236 +1,1241 @@ /*- * SPDX-License-Identifier: BSD-2-Clause-FreeBSD * * Copyright (c) 2003-2008, Joseph Koshy * Copyright (c) 2007 The FreeBSD Foundation * All rights reserved. * * Portions of this software were developed by A. Joseph Koshy under * sponsorship from the FreeBSD Foundation and Google, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _SYS_PMC_H_ #define _SYS_PMC_H_ #include #include #include #include #include #ifdef _KERNEL #include #include #endif #define PMC_MODULE_NAME "hwpmc" #define PMC_NAME_MAX 64 /* HW counter name size */ #define PMC_CLASS_MAX 8 /* max #classes of PMCs per-system */ /* * Kernel<->userland API version number [MMmmpppp] * * Major numbers are to be incremented when an incompatible change to * the ABI occurs that older clients will not be able to handle. * * Minor numbers are incremented when a backwards compatible change * occurs that allows older correct programs to run unchanged. For * example, when support for a new PMC type is added. * * The patch version is incremented for every bug fix. */ #define PMC_VERSION_MAJOR 0x09 #define PMC_VERSION_MINOR 0x03 #define PMC_VERSION_PATCH 0x0000 #define PMC_VERSION (PMC_VERSION_MAJOR << 24 | \ PMC_VERSION_MINOR << 16 | PMC_VERSION_PATCH) #define PMC_CPUID_LEN 64 /* cpu model name for pmu lookup */ extern char pmc_cpuid[PMC_CPUID_LEN]; /* * Kinds of CPUs known. * * We keep track of CPU variants that need to be distinguished in * some way for PMC operations. CPU names are grouped by manufacturer * and numbered sparsely in order to minimize changes to the ABI involved * when new CPUs are added. */ #define __PMC_CPUS() \ __PMC_CPU(AMD_K7, 0x00, "AMD K7") \ __PMC_CPU(AMD_K8, 0x01, "AMD K8") \ __PMC_CPU(INTEL_P5, 0x80, "Intel Pentium") \ __PMC_CPU(INTEL_P6, 0x81, "Intel Pentium Pro") \ __PMC_CPU(INTEL_CL, 0x82, "Intel Celeron") \ __PMC_CPU(INTEL_PII, 0x83, "Intel Pentium II") \ __PMC_CPU(INTEL_PIII, 0x84, "Intel Pentium III") \ __PMC_CPU(INTEL_PM, 0x85, "Intel Pentium M") \ __PMC_CPU(INTEL_PIV, 0x86, "Intel Pentium IV") \ __PMC_CPU(INTEL_CORE, 0x87, "Intel Core Solo/Duo") \ __PMC_CPU(INTEL_CORE2, 0x88, "Intel Core2") \ __PMC_CPU(INTEL_CORE2EXTREME, 0x89, "Intel Core2 Extreme") \ __PMC_CPU(INTEL_ATOM, 0x8A, "Intel Atom") \ __PMC_CPU(INTEL_COREI7, 0x8B, "Intel Core i7") \ __PMC_CPU(INTEL_WESTMERE, 0x8C, "Intel Westmere") \ __PMC_CPU(INTEL_SANDYBRIDGE, 0x8D, "Intel Sandy Bridge") \ __PMC_CPU(INTEL_IVYBRIDGE, 0x8E, "Intel Ivy Bridge") \ __PMC_CPU(INTEL_SANDYBRIDGE_XEON, 0x8F, "Intel Sandy Bridge Xeon") \ __PMC_CPU(INTEL_IVYBRIDGE_XEON, 0x90, "Intel Ivy Bridge Xeon") \ __PMC_CPU(INTEL_HASWELL, 0x91, "Intel Haswell") \ __PMC_CPU(INTEL_ATOM_SILVERMONT, 0x92, "Intel Atom Silvermont") \ __PMC_CPU(INTEL_NEHALEM_EX, 0x93, "Intel Nehalem Xeon 7500") \ __PMC_CPU(INTEL_WESTMERE_EX, 0x94, "Intel Westmere Xeon E7") \ __PMC_CPU(INTEL_HASWELL_XEON, 0x95, "Intel Haswell Xeon E5 v3") \ __PMC_CPU(INTEL_BROADWELL, 0x96, "Intel Broadwell") \ __PMC_CPU(INTEL_BROADWELL_XEON, 0x97, "Intel Broadwell Xeon") \ __PMC_CPU(INTEL_SKYLAKE, 0x98, "Intel Skylake") \ __PMC_CPU(INTEL_SKYLAKE_XEON, 0x99, "Intel Skylake Xeon") \ __PMC_CPU(INTEL_ATOM_GOLDMONT, 0x9A, "Intel Atom Goldmont") \ __PMC_CPU(INTEL_ICELAKE, 0x9B, "Intel Icelake") \ __PMC_CPU(INTEL_ICELAKE_XEON, 0x9C, "Intel Icelake Xeon") \ __PMC_CPU(INTEL_ALDERLAKE, 0x9D, "Intel Alderlake") \ __PMC_CPU(INTEL_ATOM_GOLDMONT_P, 0x9E, "Intel Atom Goldmont Plus") \ __PMC_CPU(INTEL_ATOM_TREMONT, 0x9F, "Intel Atom Tremont") \ __PMC_CPU(INTEL_XSCALE, 0x100, "Intel XScale") \ __PMC_CPU(MIPS_24K, 0x200, "MIPS 24K") \ __PMC_CPU(MIPS_OCTEON, 0x201, "Cavium Octeon") \ __PMC_CPU(MIPS_74K, 0x202, "MIPS 74K") \ __PMC_CPU(MIPS_BERI, 0x203, "BERI") \ __PMC_CPU(PPC_7450, 0x300, "PowerPC MPC7450") \ __PMC_CPU(PPC_E500, 0x340, "PowerPC e500 Core") \ __PMC_CPU(PPC_970, 0x380, "IBM PowerPC 970") \ __PMC_CPU(PPC_POWER8, 0x390, "IBM POWER8") \ __PMC_CPU(GENERIC, 0x400, "Generic") \ __PMC_CPU(ARMV7_CORTEX_A5, 0x500, "ARMv7 Cortex A5") \ __PMC_CPU(ARMV7_CORTEX_A7, 0x501, "ARMv7 Cortex A7") \ __PMC_CPU(ARMV7_CORTEX_A8, 0x502, "ARMv7 Cortex A8") \ __PMC_CPU(ARMV7_CORTEX_A9, 0x503, "ARMv7 Cortex A9") \ __PMC_CPU(ARMV7_CORTEX_A15, 0x504, "ARMv7 Cortex A15") \ __PMC_CPU(ARMV7_CORTEX_A17, 0x505, "ARMv7 Cortex A17") \ __PMC_CPU(ARMV8_CORTEX_A53, 0x600, "ARMv8 Cortex A53") \ __PMC_CPU(ARMV8_CORTEX_A57, 0x601, "ARMv8 Cortex A57") \ __PMC_CPU(ARMV8_CORTEX_A76, 0x602, "ARMv8 Cortex A76") enum pmc_cputype { #undef __PMC_CPU #define __PMC_CPU(S,V,D) PMC_CPU_##S = V, __PMC_CPUS() }; #define PMC_CPU_FIRST PMC_CPU_AMD_K7 #define PMC_CPU_LAST PMC_CPU_ARMV8_CORTEX_A76 /* * Classes of PMCs */ #define __PMC_CLASSES() \ __PMC_CLASS(TSC, 0x00, "CPU Timestamp counter") \ __PMC_CLASS(K7, 0x01, "AMD K7 performance counters") \ __PMC_CLASS(K8, 0x02, "AMD K8 performance counters") \ __PMC_CLASS(P5, 0x03, "Intel Pentium counters") \ __PMC_CLASS(P6, 0x04, "Intel Pentium Pro counters") \ __PMC_CLASS(P4, 0x05, "Intel Pentium-IV counters") \ __PMC_CLASS(IAF, 0x06, "Intel Core2/Atom, fixed function") \ __PMC_CLASS(IAP, 0x07, "Intel Core...Atom, programmable") \ __PMC_CLASS(UCF, 0x08, "Intel Uncore fixed function") \ __PMC_CLASS(UCP, 0x09, "Intel Uncore programmable") \ __PMC_CLASS(XSCALE, 0x0A, "Intel XScale counters") \ __PMC_CLASS(MIPS24K, 0x0B, "MIPS 24K") \ __PMC_CLASS(OCTEON, 0x0C, "Cavium Octeon") \ __PMC_CLASS(PPC7450, 0x0D, "Motorola MPC7450 class") \ __PMC_CLASS(PPC970, 0x0E, "IBM PowerPC 970 class") \ __PMC_CLASS(SOFT, 0x0F, "Software events") \ __PMC_CLASS(ARMV7, 0x10, "ARMv7") \ __PMC_CLASS(ARMV8, 0x11, "ARMv8") \ __PMC_CLASS(MIPS74K, 0x12, "MIPS 74K") \ __PMC_CLASS(E500, 0x13, "Freescale e500 class") \ __PMC_CLASS(BERI, 0x14, "MIPS BERI") \ - __PMC_CLASS(POWER8, 0x15, "IBM POWER8 class") + __PMC_CLASS(POWER8, 0x15, "IBM POWER8 class") \ + __PMC_CLASS(DMC620_PMU_CD2, 0x16, "ARM DMC620 Memory Controller PMU CLKDIV2") \ + __PMC_CLASS(DMC620_PMU_C, 0x17, "ARM DMC620 Memory Controller PMU CLK") \ + __PMC_CLASS(CMN600_PMU, 0x18, "Arm CoreLink CMN600 Coherent Mesh Network PMU") enum pmc_class { #undef __PMC_CLASS #define __PMC_CLASS(S,V,D) PMC_CLASS_##S = V, __PMC_CLASSES() }; #define PMC_CLASS_FIRST PMC_CLASS_TSC -#define PMC_CLASS_LAST PMC_CLASS_POWER8 +#define PMC_CLASS_LAST PMC_CLASS_CMN600_PMU /* * A PMC can be in the following states: * * Hardware states: * DISABLED -- administratively prohibited from being used. * FREE -- HW available for use * Software states: * ALLOCATED -- allocated * STOPPED -- allocated, but not counting events * RUNNING -- allocated, and in operation; 'pm_runcount' * holds the number of CPUs using this PMC at * a given instant * DELETED -- being destroyed */ #define __PMC_HWSTATES() \ __PMC_STATE(DISABLED) \ __PMC_STATE(FREE) #define __PMC_SWSTATES() \ __PMC_STATE(ALLOCATED) \ __PMC_STATE(STOPPED) \ __PMC_STATE(RUNNING) \ __PMC_STATE(DELETED) #define __PMC_STATES() \ __PMC_HWSTATES() \ __PMC_SWSTATES() enum pmc_state { #undef __PMC_STATE #define __PMC_STATE(S) PMC_STATE_##S, __PMC_STATES() __PMC_STATE(MAX) }; #define PMC_STATE_FIRST PMC_STATE_DISABLED #define PMC_STATE_LAST PMC_STATE_DELETED /* * An allocated PMC may used as a 'global' counter or as a * 'thread-private' one. Each such mode of use can be in either * statistical sampling mode or in counting mode. Thus a PMC in use * * SS i.e., SYSTEM STATISTICAL -- system-wide statistical profiling * SC i.e., SYSTEM COUNTER -- system-wide counting mode * TS i.e., THREAD STATISTICAL -- thread virtual, statistical profiling * TC i.e., THREAD COUNTER -- thread virtual, counting mode * * Statistical profiling modes rely on the PMC periodically delivering * a interrupt to the CPU (when the configured number of events have * been measured), so the PMC must have the ability to generate * interrupts. * * In counting modes, the PMC counts its configured events, with the * value of the PMC being read whenever needed by its owner process. * * The thread specific modes "virtualize" the PMCs -- the PMCs appear * to be thread private and count events only when the profiled thread * actually executes on the CPU. * * The system-wide "global" modes keep the PMCs running all the time * and are used to measure the behaviour of the whole system. */ #define __PMC_MODES() \ __PMC_MODE(SS, 0) \ __PMC_MODE(SC, 1) \ __PMC_MODE(TS, 2) \ __PMC_MODE(TC, 3) enum pmc_mode { #undef __PMC_MODE #define __PMC_MODE(M,N) PMC_MODE_##M = N, __PMC_MODES() }; #define PMC_MODE_FIRST PMC_MODE_SS #define PMC_MODE_LAST PMC_MODE_TC #define PMC_IS_COUNTING_MODE(mode) \ ((mode) == PMC_MODE_SC || (mode) == PMC_MODE_TC) #define PMC_IS_SYSTEM_MODE(mode) \ ((mode) == PMC_MODE_SS || (mode) == PMC_MODE_SC) #define PMC_IS_SAMPLING_MODE(mode) \ ((mode) == PMC_MODE_SS || (mode) == PMC_MODE_TS) #define PMC_IS_VIRTUAL_MODE(mode) \ ((mode) == PMC_MODE_TS || (mode) == PMC_MODE_TC) /* * PMC row disposition */ #define __PMC_DISPOSITIONS(N) \ __PMC_DISP(STANDALONE) /* global/disabled counters */ \ __PMC_DISP(FREE) /* free/available */ \ __PMC_DISP(THREAD) /* thread-virtual PMCs */ \ __PMC_DISP(UNKNOWN) /* sentinel */ enum pmc_disp { #undef __PMC_DISP #define __PMC_DISP(D) PMC_DISP_##D , __PMC_DISPOSITIONS() }; #define PMC_DISP_FIRST PMC_DISP_STANDALONE #define PMC_DISP_LAST PMC_DISP_THREAD /* * Counter capabilities * * __PMC_CAPS(NAME, VALUE, DESCRIPTION) */ #define __PMC_CAPS() \ __PMC_CAP(INTERRUPT, 0, "generate interrupts") \ __PMC_CAP(USER, 1, "count user-mode events") \ __PMC_CAP(SYSTEM, 2, "count system-mode events") \ __PMC_CAP(EDGE, 3, "do edge detection of events") \ __PMC_CAP(THRESHOLD, 4, "ignore events below a threshold") \ __PMC_CAP(READ, 5, "read PMC counter") \ __PMC_CAP(WRITE, 6, "reprogram PMC counter") \ __PMC_CAP(INVERT, 7, "invert comparison sense") \ __PMC_CAP(QUALIFIER, 8, "further qualify monitored events") \ __PMC_CAP(PRECISE, 9, "perform precise sampling") \ __PMC_CAP(TAGGING, 10, "tag upstream events") \ - __PMC_CAP(CASCADE, 11, "cascade counters") + __PMC_CAP(CASCADE, 11, "cascade counters") \ + __PMC_CAP(SYSWIDE, 12, "system wide counter") \ + __PMC_CAP(DOMWIDE, 13, "NUMA domain wide counter") enum pmc_caps { #undef __PMC_CAP #define __PMC_CAP(NAME, VALUE, DESCR) PMC_CAP_##NAME = (1 << VALUE) , __PMC_CAPS() }; #define PMC_CAP_FIRST PMC_CAP_INTERRUPT -#define PMC_CAP_LAST PMC_CAP_CASCADE +#define PMC_CAP_LAST PMC_CAP_DOMWIDE /* * PMC Event Numbers * * These are generated from the definitions in "dev/hwpmc/pmc_events.h". */ enum pmc_event { #undef __PMC_EV #undef __PMC_EV_BLOCK #define __PMC_EV_BLOCK(C,V) PMC_EV_ ## C ## __BLOCK_START = (V) - 1 , #define __PMC_EV(C,N) PMC_EV_ ## C ## _ ## N , __PMC_EVENTS() }; /* * PMC SYSCALL INTERFACE */ /* * "PMC_OPS" -- these are the commands recognized by the kernel * module, and are used when performing a system call from userland. */ #define __PMC_OPS() \ __PMC_OP(CONFIGURELOG, "Set log file") \ __PMC_OP(FLUSHLOG, "Flush log file") \ __PMC_OP(GETCPUINFO, "Get system CPU information") \ __PMC_OP(GETDRIVERSTATS, "Get driver statistics") \ __PMC_OP(GETMODULEVERSION, "Get module version") \ __PMC_OP(GETPMCINFO, "Get per-cpu PMC information") \ __PMC_OP(PMCADMIN, "Set PMC state") \ __PMC_OP(PMCALLOCATE, "Allocate and configure a PMC") \ __PMC_OP(PMCATTACH, "Attach a PMC to a process") \ __PMC_OP(PMCDETACH, "Detach a PMC from a process") \ __PMC_OP(PMCGETMSR, "Get a PMC's hardware address") \ __PMC_OP(PMCRELEASE, "Release a PMC") \ __PMC_OP(PMCRW, "Read/Set a PMC") \ __PMC_OP(PMCSETCOUNT, "Set initial count/sampling rate") \ __PMC_OP(PMCSTART, "Start a PMC") \ __PMC_OP(PMCSTOP, "Stop a PMC") \ __PMC_OP(WRITELOG, "Write a cookie to the log file") \ __PMC_OP(CLOSELOG, "Close log file") \ __PMC_OP(GETDYNEVENTINFO, "Get dynamic events list") enum pmc_ops { #undef __PMC_OP #define __PMC_OP(N, D) PMC_OP_##N, __PMC_OPS() }; /* * Flags used in operations on PMCs. */ #define PMC_F_UNUSED1 0x00000001 /* unused */ #define PMC_F_DESCENDANTS 0x00000002 /*OP ALLOCATE track descendants */ #define PMC_F_LOG_PROCCSW 0x00000004 /*OP ALLOCATE track ctx switches */ #define PMC_F_LOG_PROCEXIT 0x00000008 /*OP ALLOCATE log proc exits */ #define PMC_F_NEWVALUE 0x00000010 /*OP RW write new value */ #define PMC_F_OLDVALUE 0x00000020 /*OP RW get old value */ /* V2 API */ #define PMC_F_CALLCHAIN 0x00000080 /*OP ALLOCATE capture callchains */ #define PMC_F_USERCALLCHAIN 0x00000100 /*OP ALLOCATE use userspace stack */ /* internal flags */ #define PMC_F_ATTACHED_TO_OWNER 0x00010000 /*attached to owner*/ #define PMC_F_NEEDS_LOGFILE 0x00020000 /*needs log file */ #define PMC_F_ATTACH_DONE 0x00040000 /*attached at least once */ #define PMC_CALLCHAIN_DEPTH_MAX 512 #define PMC_CC_F_USERSPACE 0x01 /*userspace callchain*/ /* * Cookies used to denote allocated PMCs, and the values of PMCs. */ typedef uint32_t pmc_id_t; typedef uint64_t pmc_value_t; #define PMC_ID_INVALID (~ (pmc_id_t) 0) /* * PMC IDs have the following format: * * +-----------------------+-------+-----------+ * | CPU | PMC MODE | CLASS | ROW INDEX | * +-----------------------+-------+-----------+ * * where CPU is 12 bits, MODE 4, CLASS 8, and ROW INDEX 8 Field 'CPU' * is set to the requested CPU for system-wide PMCs or PMC_CPU_ANY for * process-mode PMCs. Field 'PMC MODE' is the allocated PMC mode. * Field 'PMC CLASS' is the class of the PMC. Field 'ROW INDEX' is the * row index for the PMC. * * The 'ROW INDEX' ranges over 0..NWPMCS where NHWPMCS is the total * number of hardware PMCs on this cpu. */ #define PMC_ID_TO_ROWINDEX(ID) ((ID) & 0xFF) #define PMC_ID_TO_CLASS(ID) (((ID) & 0xFF00) >> 8) #define PMC_ID_TO_MODE(ID) (((ID) & 0xF0000) >> 16) #define PMC_ID_TO_CPU(ID) (((ID) & 0xFFF00000) >> 20) #define PMC_ID_MAKE_ID(CPU,MODE,CLASS,ROWINDEX) \ ((((CPU) & 0xFFF) << 20) | (((MODE) & 0xF) << 16) | \ (((CLASS) & 0xFF) << 8) | ((ROWINDEX) & 0xFF)) /* * Data structures for system calls supported by the pmc driver. */ /* * OP PMCALLOCATE * * Allocate a PMC on the named CPU. */ #define PMC_CPU_ANY ~0 struct pmc_op_pmcallocate { uint32_t pm_caps; /* PMC_CAP_* */ uint32_t pm_cpu; /* CPU number or PMC_CPU_ANY */ enum pmc_class pm_class; /* class of PMC desired */ enum pmc_event pm_ev; /* [enum pmc_event] desired */ uint32_t pm_flags; /* additional modifiers PMC_F_* */ enum pmc_mode pm_mode; /* desired mode */ pmc_id_t pm_pmcid; /* [return] process pmc id */ pmc_value_t pm_count; /* initial/sample count */ union pmc_md_op_pmcallocate pm_md; /* MD layer extensions */ }; /* * OP PMCADMIN * * Set the administrative state (i.e., whether enabled or disabled) of * a PMC 'pm_pmc' on CPU 'pm_cpu'. Note that 'pm_pmc' specifies an * absolute PMC number and need not have been first allocated by the * calling process. */ struct pmc_op_pmcadmin { int pm_cpu; /* CPU# */ uint32_t pm_flags; /* flags */ int pm_pmc; /* PMC# */ enum pmc_state pm_state; /* desired state */ }; /* * OP PMCATTACH / OP PMCDETACH * * Attach/detach a PMC and a process. */ struct pmc_op_pmcattach { pmc_id_t pm_pmc; /* PMC to attach to */ pid_t pm_pid; /* target process */ }; /* * OP PMCSETCOUNT * * Set the sampling rate (i.e., the reload count) for statistical counters. * 'pm_pmcid' need to have been previously allocated using PMCALLOCATE. */ struct pmc_op_pmcsetcount { pmc_value_t pm_count; /* initial/sample count */ pmc_id_t pm_pmcid; /* PMC id to set */ }; /* * OP PMCRW * * Read the value of a PMC named by 'pm_pmcid'. 'pm_pmcid' needs * to have been previously allocated using PMCALLOCATE. */ struct pmc_op_pmcrw { uint32_t pm_flags; /* PMC_F_{OLD,NEW}VALUE*/ pmc_id_t pm_pmcid; /* pmc id */ pmc_value_t pm_value; /* new&returned value */ }; /* * OP GETPMCINFO * * retrieve PMC state for a named CPU. The caller is expected to * allocate 'npmc' * 'struct pmc_info' bytes of space for the return * values. */ struct pmc_info { char pm_name[PMC_NAME_MAX]; /* pmc name */ enum pmc_class pm_class; /* enum pmc_class */ int pm_enabled; /* whether enabled */ enum pmc_disp pm_rowdisp; /* FREE, THREAD or STANDLONE */ pid_t pm_ownerpid; /* owner, or -1 */ enum pmc_mode pm_mode; /* current mode [enum pmc_mode] */ enum pmc_event pm_event; /* current event */ uint32_t pm_flags; /* current flags */ pmc_value_t pm_reloadcount; /* sampling counters only */ }; struct pmc_op_getpmcinfo { int32_t pm_cpu; /* 0 <= cpu < mp_maxid */ struct pmc_info pm_pmcs[]; /* space for 'npmc' structures */ }; /* * OP GETCPUINFO * * Retrieve system CPU information. */ struct pmc_classinfo { enum pmc_class pm_class; /* class id */ uint32_t pm_caps; /* counter capabilities */ uint32_t pm_width; /* width of the PMC */ uint32_t pm_num; /* number of PMCs in class */ }; struct pmc_op_getcpuinfo { enum pmc_cputype pm_cputype; /* what kind of CPU */ uint32_t pm_ncpu; /* max CPU number */ uint32_t pm_npmc; /* #PMCs per CPU */ uint32_t pm_nclass; /* #classes of PMCs */ struct pmc_classinfo pm_classes[PMC_CLASS_MAX]; }; /* * OP CONFIGURELOG * * Configure a log file for writing system-wide statistics to. */ struct pmc_op_configurelog { int pm_flags; int pm_logfd; /* logfile fd (or -1) */ }; /* * OP GETDRIVERSTATS * * Retrieve pmc(4) driver-wide statistics. */ #ifdef _KERNEL struct pmc_driverstats { counter_u64_t pm_intr_ignored; /* #interrupts ignored */ counter_u64_t pm_intr_processed; /* #interrupts processed */ counter_u64_t pm_intr_bufferfull; /* #interrupts with ENOSPC */ counter_u64_t pm_syscalls; /* #syscalls */ counter_u64_t pm_syscall_errors; /* #syscalls with errors */ counter_u64_t pm_buffer_requests; /* #buffer requests */ counter_u64_t pm_buffer_requests_failed; /* #failed buffer requests */ counter_u64_t pm_log_sweeps; /* #sample buffer processing passes */ counter_u64_t pm_merges; /* merged k+u */ counter_u64_t pm_overwrites; /* UR overwrites */ }; #endif struct pmc_op_getdriverstats { unsigned int pm_intr_ignored; /* #interrupts ignored */ unsigned int pm_intr_processed; /* #interrupts processed */ unsigned int pm_intr_bufferfull; /* #interrupts with ENOSPC */ unsigned int pm_syscalls; /* #syscalls */ unsigned int pm_syscall_errors; /* #syscalls with errors */ unsigned int pm_buffer_requests; /* #buffer requests */ unsigned int pm_buffer_requests_failed; /* #failed buffer requests */ unsigned int pm_log_sweeps; /* #sample buffer processing passes */ }; /* * OP RELEASE / OP START / OP STOP * * Simple operations on a PMC id. */ struct pmc_op_simple { pmc_id_t pm_pmcid; }; /* * OP WRITELOG * * Flush the current log buffer and write 4 bytes of user data to it. */ struct pmc_op_writelog { uint32_t pm_userdata; }; /* * OP GETMSR * * Retrieve the machine specific address associated with the allocated * PMC. This number can be used subsequently with a read-performance-counter * instruction. */ struct pmc_op_getmsr { uint32_t pm_msr; /* machine specific address */ pmc_id_t pm_pmcid; /* allocated pmc id */ }; /* * OP GETDYNEVENTINFO * * Retrieve a PMC dynamic class events list. */ struct pmc_dyn_event_descr { char pm_ev_name[PMC_NAME_MAX]; enum pmc_event pm_ev_code; }; struct pmc_op_getdyneventinfo { enum pmc_class pm_class; unsigned int pm_nevent; struct pmc_dyn_event_descr pm_events[PMC_EV_DYN_COUNT]; }; #ifdef _KERNEL #include #include #include #include #define PMC_HASH_SIZE 1024 #define PMC_MTXPOOL_SIZE 2048 #define PMC_LOG_BUFFER_SIZE 256 #define PMC_NLOGBUFFERS_PCPU 32 #define PMC_NSAMPLES 256 #define PMC_CALLCHAIN_DEPTH 128 #define PMC_THREADLIST_MAX 128 #define PMC_SYSCTL_NAME_PREFIX "kern." PMC_MODULE_NAME "." /* * Locking keys * * (b) - pmc_bufferlist_mtx (spin lock) * (k) - pmc_kthread_mtx (sleep lock) * (o) - po->po_mtx (spin lock) * (g) - global_epoch_preempt (epoch) * (p) - pmc_sx (sx) */ /* * PMC commands */ struct pmc_syscall_args { register_t pmop_code; /* one of PMC_OP_* */ void *pmop_data; /* syscall parameter */ }; /* * Interface to processor specific s1tuff */ /* * struct pmc_descr * * Machine independent (i.e., the common parts) of a human readable * PMC description. */ struct pmc_descr { char pd_name[PMC_NAME_MAX]; /* name */ uint32_t pd_caps; /* capabilities */ enum pmc_class pd_class; /* class of the PMC */ uint32_t pd_width; /* width in bits */ }; /* * struct pmc_target * * This structure records all the target processes associated with a * PMC. */ struct pmc_target { LIST_ENTRY(pmc_target) pt_next; struct pmc_process *pt_process; /* target descriptor */ }; /* * struct pmc * * Describes each allocated PMC. * * Each PMC has precisely one owner, namely the process that allocated * the PMC. * * A PMC may be attached to multiple target processes. The * 'pm_targets' field links all the target processes being monitored * by this PMC. * * The 'pm_savedvalue' field is protected by a mutex. * * On a multi-cpu machine, multiple target threads associated with a * process-virtual PMC could be concurrently executing on different * CPUs. The 'pm_runcount' field is atomically incremented every time * the PMC gets scheduled on a CPU and atomically decremented when it * get descheduled. Deletion of a PMC is only permitted when this * field is '0'. * */ struct pmc_pcpu_state { uint32_t pps_overflowcnt; /* count overflow interrupts */ uint8_t pps_stalled; uint8_t pps_cpustate; } __aligned(CACHE_LINE_SIZE); struct pmc { LIST_HEAD(,pmc_target) pm_targets; /* list of target processes */ LIST_ENTRY(pmc) pm_next; /* owner's list */ /* * System-wide PMCs are allocated on a CPU and are not moved * around. For system-wide PMCs we record the CPU the PMC was * allocated on in the 'CPU' field of the pmc ID. * * Virtual PMCs run on whichever CPU is currently executing * their targets' threads. For these PMCs we need to save * their current PMC counter values when they are taken off * CPU. */ union { pmc_value_t pm_savedvalue; /* Virtual PMCS */ } pm_gv; /* * For sampling mode PMCs, we keep track of the PMC's "reload * count", which is the counter value to be loaded in when * arming the PMC for the next counting session. For counting * modes on PMCs that are read-only (e.g., the x86 TSC), we * keep track of the initial value at the start of * counting-mode operation. */ union { pmc_value_t pm_reloadcount; /* sampling PMC modes */ pmc_value_t pm_initial; /* counting PMC modes */ } pm_sc; struct pmc_pcpu_state *pm_pcpu_state; volatile cpuset_t pm_cpustate; /* CPUs where PMC should be active */ uint32_t pm_caps; /* PMC capabilities */ enum pmc_event pm_event; /* event being measured */ uint32_t pm_flags; /* additional flags PMC_F_... */ struct pmc_owner *pm_owner; /* owner thread state */ counter_u64_t pm_runcount; /* #cpus currently on */ enum pmc_state pm_state; /* current PMC state */ /* * The PMC ID field encodes the row-index for the PMC, its * mode, class and the CPU# associated with the PMC. */ pmc_id_t pm_id; /* allocated PMC id */ enum pmc_class pm_class; /* md extensions */ union pmc_md_pmc pm_md; }; /* * Accessor macros for 'struct pmc' */ #define PMC_TO_MODE(P) PMC_ID_TO_MODE((P)->pm_id) #define PMC_TO_CLASS(P) PMC_ID_TO_CLASS((P)->pm_id) #define PMC_TO_ROWINDEX(P) PMC_ID_TO_ROWINDEX((P)->pm_id) #define PMC_TO_CPU(P) PMC_ID_TO_CPU((P)->pm_id) /* * struct pmc_threadpmcstate * * Record per-PMC, per-thread state. */ struct pmc_threadpmcstate { pmc_value_t pt_pmcval; /* per-thread reload count */ }; /* * struct pmc_thread * * Record a 'target' thread being profiled. */ struct pmc_thread { LIST_ENTRY(pmc_thread) pt_next; /* linked list */ struct thread *pt_td; /* target thread */ struct pmc_threadpmcstate pt_pmcs[]; /* per-PMC state */ }; /* * struct pmc_process * * Record a 'target' process being profiled. * * The target process being profiled could be different from the owner * process which allocated the PMCs. Each target process descriptor * is associated with NHWPMC 'struct pmc *' pointers. Each PMC at a * given hardware row-index 'n' will use slot 'n' of the 'pp_pmcs[]' * array. The size of this structure is thus PMC architecture * dependent. * */ struct pmc_targetstate { struct pmc *pp_pmc; /* target PMC */ pmc_value_t pp_pmcval; /* per-process value */ }; struct pmc_process { LIST_ENTRY(pmc_process) pp_next; /* hash chain */ LIST_HEAD(,pmc_thread) pp_tds; /* list of threads */ struct mtx *pp_tdslock; /* lock on pp_tds thread list */ int pp_refcnt; /* reference count */ uint32_t pp_flags; /* flags PMC_PP_* */ struct proc *pp_proc; /* target process */ struct pmc_targetstate pp_pmcs[]; /* NHWPMCs */ }; #define PMC_PP_ENABLE_MSR_ACCESS 0x00000001 /* * struct pmc_owner * * We associate a PMC with an 'owner' process. * * A process can be associated with 0..NCPUS*NHWPMC PMCs during its * lifetime, where NCPUS is the numbers of CPUS in the system and * NHWPMC is the number of hardware PMCs per CPU. These are * maintained in the list headed by the 'po_pmcs' to save on space. * */ struct pmc_owner { LIST_ENTRY(pmc_owner) po_next; /* hash chain */ CK_LIST_ENTRY(pmc_owner) po_ssnext; /* (g/p) list of SS PMC owners */ LIST_HEAD(, pmc) po_pmcs; /* owned PMC list */ TAILQ_HEAD(, pmclog_buffer) po_logbuffers; /* (o) logbuffer list */ struct mtx po_mtx; /* spin lock for (o) */ struct proc *po_owner; /* owner proc */ uint32_t po_flags; /* (k) flags PMC_PO_* */ struct proc *po_kthread; /* (k) helper kthread */ struct file *po_file; /* file reference */ int po_error; /* recorded error */ short po_sscount; /* # SS PMCs owned */ short po_logprocmaps; /* global mappings done */ struct pmclog_buffer *po_curbuf[MAXCPU]; /* current log buffer */ }; #define PMC_PO_OWNS_LOGFILE 0x00000001 /* has a log file */ #define PMC_PO_SHUTDOWN 0x00000010 /* in the process of shutdown */ #define PMC_PO_INITIAL_MAPPINGS_DONE 0x00000020 /* * struct pmc_hw -- describe the state of the PMC hardware * * When in use, a HW PMC is associated with one allocated 'struct pmc' * pointed to by field 'phw_pmc'. When inactive, this field is NULL. * * On an SMP box, one or more HW PMC's in process virtual mode with * the same 'phw_pmc' could be executing on different CPUs. In order * to handle this case correctly, we need to ensure that only * incremental counts get added to the saved value in the associated * 'struct pmc'. The 'phw_save' field is used to keep the saved PMC * value at the time the hardware is started during this context * switch (i.e., the difference between the new (hardware) count and * the saved count is atomically added to the count field in 'struct * pmc' at context switch time). * */ struct pmc_hw { uint32_t phw_state; /* see PHW_* macros below */ struct pmc *phw_pmc; /* current thread PMC */ }; #define PMC_PHW_RI_MASK 0x000000FF #define PMC_PHW_CPU_SHIFT 8 #define PMC_PHW_CPU_MASK 0x0000FF00 #define PMC_PHW_FLAGS_SHIFT 16 #define PMC_PHW_FLAGS_MASK 0xFFFF0000 #define PMC_PHW_INDEX_TO_STATE(ri) ((ri) & PMC_PHW_RI_MASK) #define PMC_PHW_STATE_TO_INDEX(state) ((state) & PMC_PHW_RI_MASK) #define PMC_PHW_CPU_TO_STATE(cpu) (((cpu) << PMC_PHW_CPU_SHIFT) & \ PMC_PHW_CPU_MASK) #define PMC_PHW_STATE_TO_CPU(state) (((state) & PMC_PHW_CPU_MASK) >> \ PMC_PHW_CPU_SHIFT) #define PMC_PHW_FLAGS_TO_STATE(flags) (((flags) << PMC_PHW_FLAGS_SHIFT) & \ PMC_PHW_FLAGS_MASK) #define PMC_PHW_STATE_TO_FLAGS(state) (((state) & PMC_PHW_FLAGS_MASK) >> \ PMC_PHW_FLAGS_SHIFT) #define PMC_PHW_FLAG_IS_ENABLED (PMC_PHW_FLAGS_TO_STATE(0x01)) #define PMC_PHW_FLAG_IS_SHAREABLE (PMC_PHW_FLAGS_TO_STATE(0x02)) /* * struct pmc_sample * * Space for N (tunable) PC samples and associated control data. */ struct pmc_sample { uint16_t ps_nsamples; /* callchain depth */ uint16_t ps_nsamples_actual; uint16_t ps_cpu; /* cpu number */ uint16_t ps_flags; /* other flags */ lwpid_t ps_tid; /* thread id */ pid_t ps_pid; /* process PID or -1 */ int ps_ticks; /* ticks at sample time */ /* pad */ struct thread *ps_td; /* which thread */ struct pmc *ps_pmc; /* interrupting PMC */ uintptr_t *ps_pc; /* (const) callchain start */ uint64_t ps_tsc; /* tsc value */ }; #define PMC_SAMPLE_FREE ((uint16_t) 0) #define PMC_USER_CALLCHAIN_PENDING ((uint16_t) 0xFFFF) struct pmc_samplebuffer { volatile uint64_t ps_prodidx; /* producer index */ volatile uint64_t ps_considx; /* consumer index */ uintptr_t *ps_callchains; /* all saved call chains */ struct pmc_sample ps_samples[]; /* array of sample entries */ }; #define PMC_CONS_SAMPLE(psb) \ (&(psb)->ps_samples[(psb)->ps_considx & pmc_sample_mask]) #define PMC_CONS_SAMPLE_OFF(psb, off) \ (&(psb)->ps_samples[(off) & pmc_sample_mask]) #define PMC_PROD_SAMPLE(psb) \ (&(psb)->ps_samples[(psb)->ps_prodidx & pmc_sample_mask]) /* * struct pmc_cpustate * * A CPU is modelled as a collection of HW PMCs with space for additional * flags. */ struct pmc_cpu { uint32_t pc_state; /* physical cpu number + flags */ struct pmc_samplebuffer *pc_sb[3]; /* space for samples */ struct pmc_hw *pc_hwpmcs[]; /* 'npmc' pointers */ }; #define PMC_PCPU_CPU_MASK 0x000000FF #define PMC_PCPU_FLAGS_MASK 0xFFFFFF00 #define PMC_PCPU_FLAGS_SHIFT 8 #define PMC_PCPU_STATE_TO_CPU(S) ((S) & PMC_PCPU_CPU_MASK) #define PMC_PCPU_STATE_TO_FLAGS(S) (((S) & PMC_PCPU_FLAGS_MASK) >> PMC_PCPU_FLAGS_SHIFT) #define PMC_PCPU_FLAGS_TO_STATE(F) (((F) << PMC_PCPU_FLAGS_SHIFT) & PMC_PCPU_FLAGS_MASK) #define PMC_PCPU_CPU_TO_STATE(C) ((C) & PMC_PCPU_CPU_MASK) #define PMC_PCPU_FLAG_HTT (PMC_PCPU_FLAGS_TO_STATE(0x1)) /* * struct pmc_binding * * CPU binding information. */ struct pmc_binding { int pb_bound; /* is bound? */ int pb_cpu; /* if so, to which CPU */ u_char pb_priority; /* Thread active priority. */ }; struct pmc_mdep; /* * struct pmc_classdep * * PMC class-dependent operations. */ struct pmc_classdep { uint32_t pcd_caps; /* class capabilities */ enum pmc_class pcd_class; /* class id */ int pcd_num; /* number of PMCs */ int pcd_ri; /* row index of the first PMC in class */ int pcd_width; /* width of the PMC */ /* configuring/reading/writing the hardware PMCs */ int (*pcd_config_pmc)(int _cpu, int _ri, struct pmc *_pm); int (*pcd_get_config)(int _cpu, int _ri, struct pmc **_ppm); int (*pcd_read_pmc)(int _cpu, int _ri, pmc_value_t *_value); int (*pcd_write_pmc)(int _cpu, int _ri, pmc_value_t _value); /* pmc allocation/release */ int (*pcd_allocate_pmc)(int _cpu, int _ri, struct pmc *_t, const struct pmc_op_pmcallocate *_a); int (*pcd_release_pmc)(int _cpu, int _ri, struct pmc *_pm); /* starting and stopping PMCs */ int (*pcd_start_pmc)(int _cpu, int _ri); int (*pcd_stop_pmc)(int _cpu, int _ri); /* description */ int (*pcd_describe)(int _cpu, int _ri, struct pmc_info *_pi, struct pmc **_ppmc); /* class-dependent initialization & finalization */ int (*pcd_pcpu_init)(struct pmc_mdep *_md, int _cpu); int (*pcd_pcpu_fini)(struct pmc_mdep *_md, int _cpu); /* machine-specific interface */ int (*pcd_get_msr)(int _ri, uint32_t *_msr); }; /* * struct pmc_mdep * * Machine dependent bits needed per CPU type. */ struct pmc_mdep { uint32_t pmd_cputype; /* from enum pmc_cputype */ uint32_t pmd_npmc; /* number of PMCs per CPU */ uint32_t pmd_nclass; /* number of PMC classes present */ /* * Machine dependent methods. */ /* per-cpu initialization and finalization */ int (*pmd_pcpu_init)(struct pmc_mdep *_md, int _cpu); int (*pmd_pcpu_fini)(struct pmc_mdep *_md, int _cpu); /* thread context switch in/out */ int (*pmd_switch_in)(struct pmc_cpu *_p, struct pmc_process *_pp); int (*pmd_switch_out)(struct pmc_cpu *_p, struct pmc_process *_pp); /* handle a PMC interrupt */ int (*pmd_intr)(struct trapframe *_tf); /* * PMC class dependent information. */ struct pmc_classdep pmd_classdep[]; }; /* * Per-CPU state. This is an array of 'mp_ncpu' pointers * to struct pmc_cpu descriptors. */ extern struct pmc_cpu **pmc_pcpu; /* driver statistics */ extern struct pmc_driverstats pmc_stats; #if defined(HWPMC_DEBUG) #include /* debug flags, major flag groups */ struct pmc_debugflags { int pdb_CPU; int pdb_CSW; int pdb_LOG; int pdb_MDP; int pdb_MOD; int pdb_OWN; int pdb_PMC; int pdb_PRC; int pdb_SAM; }; extern struct pmc_debugflags pmc_debugflags; #define KTR_PMC KTR_SUBSYS #define PMC_DEBUG_STRSIZE 128 #define PMC_DEBUG_DEFAULT_FLAGS { 0, 0, 0, 0, 0, 0, 0, 0, 0 } #define PMCDBG0(M, N, L, F) do { \ if (pmc_debugflags.pdb_ ## M & (1 << PMC_DEBUG_MIN_ ## N)) \ CTR0(KTR_PMC, #M ":" #N ":" #L ": " F); \ } while (0) #define PMCDBG1(M, N, L, F, p1) do { \ if (pmc_debugflags.pdb_ ## M & (1 << PMC_DEBUG_MIN_ ## N)) \ CTR1(KTR_PMC, #M ":" #N ":" #L ": " F, p1); \ } while (0) #define PMCDBG2(M, N, L, F, p1, p2) do { \ if (pmc_debugflags.pdb_ ## M & (1 << PMC_DEBUG_MIN_ ## N)) \ CTR2(KTR_PMC, #M ":" #N ":" #L ": " F, p1, p2); \ } while (0) #define PMCDBG3(M, N, L, F, p1, p2, p3) do { \ if (pmc_debugflags.pdb_ ## M & (1 << PMC_DEBUG_MIN_ ## N)) \ CTR3(KTR_PMC, #M ":" #N ":" #L ": " F, p1, p2, p3); \ } while (0) #define PMCDBG4(M, N, L, F, p1, p2, p3, p4) do { \ if (pmc_debugflags.pdb_ ## M & (1 << PMC_DEBUG_MIN_ ## N)) \ CTR4(KTR_PMC, #M ":" #N ":" #L ": " F, p1, p2, p3, p4);\ } while (0) #define PMCDBG5(M, N, L, F, p1, p2, p3, p4, p5) do { \ if (pmc_debugflags.pdb_ ## M & (1 << PMC_DEBUG_MIN_ ## N)) \ CTR5(KTR_PMC, #M ":" #N ":" #L ": " F, p1, p2, p3, p4, \ p5); \ } while (0) #define PMCDBG6(M, N, L, F, p1, p2, p3, p4, p5, p6) do { \ if (pmc_debugflags.pdb_ ## M & (1 << PMC_DEBUG_MIN_ ## N)) \ CTR6(KTR_PMC, #M ":" #N ":" #L ": " F, p1, p2, p3, p4, \ p5, p6); \ } while (0) /* Major numbers */ #define PMC_DEBUG_MAJ_CPU 0 /* cpu switches */ #define PMC_DEBUG_MAJ_CSW 1 /* context switches */ #define PMC_DEBUG_MAJ_LOG 2 /* logging */ #define PMC_DEBUG_MAJ_MDP 3 /* machine dependent */ #define PMC_DEBUG_MAJ_MOD 4 /* misc module infrastructure */ #define PMC_DEBUG_MAJ_OWN 5 /* owner */ #define PMC_DEBUG_MAJ_PMC 6 /* pmc management */ #define PMC_DEBUG_MAJ_PRC 7 /* processes */ #define PMC_DEBUG_MAJ_SAM 8 /* sampling */ /* Minor numbers */ /* Common (8 bits) */ #define PMC_DEBUG_MIN_ALL 0 /* allocation */ #define PMC_DEBUG_MIN_REL 1 /* release */ #define PMC_DEBUG_MIN_OPS 2 /* ops: start, stop, ... */ #define PMC_DEBUG_MIN_INI 3 /* init */ #define PMC_DEBUG_MIN_FND 4 /* find */ /* MODULE */ #define PMC_DEBUG_MIN_PMH 14 /* pmc_hook */ #define PMC_DEBUG_MIN_PMS 15 /* pmc_syscall */ /* OWN */ #define PMC_DEBUG_MIN_ORM 8 /* owner remove */ #define PMC_DEBUG_MIN_OMR 9 /* owner maybe remove */ /* PROCESSES */ #define PMC_DEBUG_MIN_TLK 8 /* link target */ #define PMC_DEBUG_MIN_TUL 9 /* unlink target */ #define PMC_DEBUG_MIN_EXT 10 /* process exit */ #define PMC_DEBUG_MIN_EXC 11 /* process exec */ #define PMC_DEBUG_MIN_FRK 12 /* process fork */ #define PMC_DEBUG_MIN_ATT 13 /* attach/detach */ #define PMC_DEBUG_MIN_SIG 14 /* signalling */ /* CONTEXT SWITCHES */ #define PMC_DEBUG_MIN_SWI 8 /* switch in */ #define PMC_DEBUG_MIN_SWO 9 /* switch out */ /* PMC */ #define PMC_DEBUG_MIN_REG 8 /* pmc register */ #define PMC_DEBUG_MIN_ALR 9 /* allocate row */ /* MACHINE DEPENDENT LAYER */ #define PMC_DEBUG_MIN_REA 8 /* read */ #define PMC_DEBUG_MIN_WRI 9 /* write */ #define PMC_DEBUG_MIN_CFG 10 /* config */ #define PMC_DEBUG_MIN_STA 11 /* start */ #define PMC_DEBUG_MIN_STO 12 /* stop */ #define PMC_DEBUG_MIN_INT 13 /* interrupts */ /* CPU */ #define PMC_DEBUG_MIN_BND 8 /* bind */ #define PMC_DEBUG_MIN_SEL 9 /* select */ /* LOG */ #define PMC_DEBUG_MIN_GTB 8 /* get buf */ #define PMC_DEBUG_MIN_SIO 9 /* schedule i/o */ #define PMC_DEBUG_MIN_FLS 10 /* flush */ #define PMC_DEBUG_MIN_SAM 11 /* sample */ #define PMC_DEBUG_MIN_CLO 12 /* close */ #else #define PMCDBG0(M, N, L, F) /* nothing */ #define PMCDBG1(M, N, L, F, p1) #define PMCDBG2(M, N, L, F, p1, p2) #define PMCDBG3(M, N, L, F, p1, p2, p3) #define PMCDBG4(M, N, L, F, p1, p2, p3, p4) #define PMCDBG5(M, N, L, F, p1, p2, p3, p4, p5) #define PMCDBG6(M, N, L, F, p1, p2, p3, p4, p5, p6) #endif /* declare a dedicated memory pool */ MALLOC_DECLARE(M_PMC); /* * Functions */ struct pmc_mdep *pmc_md_initialize(void); /* MD init function */ void pmc_md_finalize(struct pmc_mdep *_md); /* MD fini function */ int pmc_getrowdisp(int _ri); int pmc_process_interrupt(int _ring, struct pmc *_pm, struct trapframe *_tf); int pmc_save_kernel_callchain(uintptr_t *_cc, int _maxsamples, struct trapframe *_tf); int pmc_save_user_callchain(uintptr_t *_cc, int _maxsamples, struct trapframe *_tf); void pmc_restore_cpu_binding(struct pmc_binding *pb); void pmc_save_cpu_binding(struct pmc_binding *pb); void pmc_select_cpu(int cpu); struct pmc_mdep *pmc_mdep_alloc(int nclasses); void pmc_mdep_free(struct pmc_mdep *md); uint64_t pmc_rdtsc(void); #endif /* _KERNEL */ #endif /* _SYS_PMC_H_ */ diff --git a/usr.sbin/pmcstat/pmcstat.c b/usr.sbin/pmcstat/pmcstat.c index 3e2d101ab113..08e43d5d446a 100644 --- a/usr.sbin/pmcstat/pmcstat.c +++ b/usr.sbin/pmcstat/pmcstat.c @@ -1,1467 +1,1521 @@ /*- * SPDX-License-Identifier: BSD-2-Clause-FreeBSD * * Copyright (c) 2003-2008, Joseph Koshy * Copyright (c) 2007 The FreeBSD Foundation * All rights reserved. * * Portions of this software were developed by A. Joseph Koshy under * sponsorship from the FreeBSD Foundation and Google, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include #include "pmcstat.h" /* * A given invocation of pmcstat(8) can manage multiple PMCs of both * the system-wide and per-process variety. Each of these could be in * 'counting mode' or in 'sampling mode'. * * For 'counting mode' PMCs, pmcstat(8) will periodically issue a * pmc_read() at the configured time interval and print out the value * of the requested PMCs. * * For 'sampling mode' PMCs it can log to a file for offline analysis, * or can analyse sampling data "on the fly", either by converting * samples to printed textual form or by creating gprof(1) compatible * profiles, one per program executed. When creating gprof(1) * profiles it can optionally merge entries from multiple processes * for a given executable into a single profile file. * * pmcstat(8) can also execute a command line and attach PMCs to the * resulting child process. The protocol used is as follows: * * - parent creates a socketpair for two way communication and * fork()s. * - subsequently: * * /Parent/ /Child/ * * - Wait for childs token. * - Sends token. * - Awaits signal to start. * - Attaches PMCs to the child's pid * and starts them. Sets up * monitoring for the child. * - Signals child to start. * - Receives signal, attempts exec(). * * After this point normal processing can happen. */ /* Globals */ int pmcstat_displayheight = DEFAULT_DISPLAY_HEIGHT; int pmcstat_displaywidth = DEFAULT_DISPLAY_WIDTH; static int pmcstat_sockpair[NSOCKPAIRFD]; static int pmcstat_kq; static kvm_t *pmcstat_kvm; static struct kinfo_proc *pmcstat_plist; struct pmcstat_args args; +static bool libpmc_initialized = false; static void pmcstat_get_cpumask(const char *cpuspec, cpuset_t *cpumask) { int cpu; const char *s; char *end; CPU_ZERO(cpumask); s = cpuspec; do { cpu = strtol(s, &end, 0); if (cpu < 0 || end == s) errx(EX_USAGE, "ERROR: Illegal CPU specification \"%s\".", cpuspec); CPU_SET(cpu, cpumask); s = end + strspn(end, ", \t"); } while (*s); assert(!CPU_EMPTY(cpumask)); } void pmcstat_cleanup(void) { struct pmcstat_ev *ev; /* release allocated PMCs. */ STAILQ_FOREACH(ev, &args.pa_events, ev_next) if (ev->ev_pmcid != PMC_ID_INVALID) { if (pmc_stop(ev->ev_pmcid) < 0) err(EX_OSERR, "ERROR: cannot stop pmc 0x%x \"%s\"", ev->ev_pmcid, ev->ev_name); if (pmc_release(ev->ev_pmcid) < 0) err(EX_OSERR, "ERROR: cannot release pmc 0x%x \"%s\"", ev->ev_pmcid, ev->ev_name); } /* de-configure the log file if present. */ if (args.pa_flags & (FLAG_HAS_PIPE | FLAG_HAS_OUTPUT_LOGFILE)) (void) pmc_configure_logfile(-1); if (args.pa_logparser) { pmclog_close(args.pa_logparser); args.pa_logparser = NULL; } pmcstat_log_shutdown_logging(); } void pmcstat_find_targets(const char *spec) { int n, nproc, pid, rv; struct pmcstat_target *pt; char errbuf[_POSIX2_LINE_MAX], *end; static struct kinfo_proc *kp; regex_t reg; regmatch_t regmatch; /* First check if we've been given a process id. */ pid = strtol(spec, &end, 0); if (end != spec && pid >= 0) { if ((pt = malloc(sizeof(*pt))) == NULL) goto outofmemory; pt->pt_pid = pid; SLIST_INSERT_HEAD(&args.pa_targets, pt, pt_next); return; } /* Otherwise treat arg as a regular expression naming processes. */ if (pmcstat_kvm == NULL) { if ((pmcstat_kvm = kvm_openfiles(NULL, "/dev/null", NULL, 0, errbuf)) == NULL) err(EX_OSERR, "ERROR: Cannot open kernel \"%s\"", errbuf); if ((pmcstat_plist = kvm_getprocs(pmcstat_kvm, KERN_PROC_PROC, 0, &nproc)) == NULL) err(EX_OSERR, "ERROR: Cannot get process list: %s", kvm_geterr(pmcstat_kvm)); } else nproc = 0; if ((rv = regcomp(®, spec, REG_EXTENDED|REG_NOSUB)) != 0) { regerror(rv, ®, errbuf, sizeof(errbuf)); err(EX_DATAERR, "ERROR: Failed to compile regex \"%s\": %s", spec, errbuf); } for (n = 0, kp = pmcstat_plist; n < nproc; n++, kp++) { if ((rv = regexec(®, kp->ki_comm, 1, ®match, 0)) == 0) { if ((pt = malloc(sizeof(*pt))) == NULL) goto outofmemory; pt->pt_pid = kp->ki_pid; SLIST_INSERT_HEAD(&args.pa_targets, pt, pt_next); } else if (rv != REG_NOMATCH) { regerror(rv, ®, errbuf, sizeof(errbuf)); errx(EX_SOFTWARE, "ERROR: Regex evalation failed: %s", errbuf); } } regfree(®); return; outofmemory: errx(EX_SOFTWARE, "Out of memory."); /*NOTREACHED*/ } void pmcstat_kill_process(void) { struct pmcstat_target *pt; assert(args.pa_flags & FLAG_HAS_COMMANDLINE); /* * If a command line was specified, it would be the very first * in the list, before any other processes specified by -t. */ pt = SLIST_FIRST(&args.pa_targets); assert(pt != NULL); if (kill(pt->pt_pid, SIGINT) != 0) err(EX_OSERR, "ERROR: cannot signal child process"); } void pmcstat_start_pmcs(void) { struct pmcstat_ev *ev; STAILQ_FOREACH(ev, &args.pa_events, ev_next) { assert(ev->ev_pmcid != PMC_ID_INVALID); if (pmc_start(ev->ev_pmcid) < 0) { warn("ERROR: Cannot start pmc 0x%x \"%s\"", ev->ev_pmcid, ev->ev_name); pmcstat_cleanup(); exit(EX_OSERR); } } } void pmcstat_print_headers(void) { struct pmcstat_ev *ev; int c, w; (void) fprintf(args.pa_printfile, PRINT_HEADER_PREFIX); STAILQ_FOREACH(ev, &args.pa_events, ev_next) { if (PMC_IS_SAMPLING_MODE(ev->ev_mode)) continue; c = PMC_IS_SYSTEM_MODE(ev->ev_mode) ? 's' : 'p'; if (ev->ev_fieldskip != 0) (void) fprintf(args.pa_printfile, "%*s", ev->ev_fieldskip, ""); w = ev->ev_fieldwidth - ev->ev_fieldskip - 2; if (c == 's') (void) fprintf(args.pa_printfile, "s/%02d/%-*s ", ev->ev_cpu, w-3, ev->ev_name); else (void) fprintf(args.pa_printfile, "p/%*s ", w, ev->ev_name); } (void) fflush(args.pa_printfile); } void pmcstat_print_counters(void) { int extra_width; struct pmcstat_ev *ev; pmc_value_t value; extra_width = sizeof(PRINT_HEADER_PREFIX) - 1; STAILQ_FOREACH(ev, &args.pa_events, ev_next) { /* skip sampling mode counters */ if (PMC_IS_SAMPLING_MODE(ev->ev_mode)) continue; if (pmc_read(ev->ev_pmcid, &value) < 0) err(EX_OSERR, "ERROR: Cannot read pmc \"%s\"", ev->ev_name); (void) fprintf(args.pa_printfile, "%*ju ", ev->ev_fieldwidth + extra_width, (uintmax_t) ev->ev_cumulative ? value : (value - ev->ev_saved)); if (ev->ev_cumulative == 0) ev->ev_saved = value; extra_width = 0; } (void) fflush(args.pa_printfile); } /* * Print output */ void pmcstat_print_pmcs(void) { static int linecount = 0; /* check if we need to print a header line */ if (++linecount > pmcstat_displayheight) { (void) fprintf(args.pa_printfile, "\n"); linecount = 1; } if (linecount == 1) pmcstat_print_headers(); (void) fprintf(args.pa_printfile, "\n"); pmcstat_print_counters(); return; } void pmcstat_show_usage(void) { errx(EX_USAGE, "[options] [commandline]\n" "\t Measure process and/or system performance using hardware\n" "\t performance monitoring counters.\n" "\t Options include:\n" "\t -C\t\t (toggle) show cumulative counts\n" "\t -D path\t create profiles in directory \"path\"\n" "\t -E\t\t (toggle) show counts at process exit\n" "\t -F file\t write a system-wide callgraph (Kcachegrind format)" " to \"file\"\n" "\t -G file\t write a system-wide callgraph to \"file\"\n" "\t -I\t\t don't resolve leaf function name, show address instead\n" "\t -L\t\t list all counters available on this host\n" "\t -M file\t print executable/gmon file map to \"file\"\n" "\t -N\t\t (toggle) capture callchains\n" "\t -O file\t send log output to \"file\"\n" "\t -P spec\t allocate a process-private sampling PMC\n" "\t -R file\t read events from \"file\"\n" "\t -S spec\t allocate a system-wide sampling PMC\n" "\t -T\t\t start in top mode\n" "\t -U \t\t merged user kernel stack capture\n" "\t -W\t\t (toggle) show counts per context switch\n" "\t -a file\t print sampled PCs and callgraph to \"file\"\n" "\t -c cpu-list\t set cpus for subsequent system-wide PMCs\n" "\t -d\t\t (toggle) track descendants\n" "\t -e\t\t use wide history counter for gprof(1) output\n" "\t -f spec\t pass \"spec\" to as plugin option\n" "\t -g\t\t produce gprof(1) compatible profiles\n" "\t -i lwp\t\t filter on thread id \"lwp\" in post-processing\n" "\t -k dir\t\t set the path to the kernel\n" "\t -l secs\t set duration time\n" "\t -m file\t print sampled PCs to \"file\"\n" "\t -n rate\t set sampling rate\n" "\t -o file\t send print output to \"file\"\n" "\t -p spec\t allocate a process-private counting PMC\n" "\t -q\t\t suppress verbosity\n" "\t -r fsroot\t specify FS root directory\n" "\t -s spec\t allocate a system-wide counting PMC\n" "\t -t process-spec attach to running processes matching " "\"process-spec\"\n" "\t -u spec \t provide short description of counters matching spec\n" "\t -v\t\t increase verbosity\n" "\t -w secs\t set printing time interval\n" "\t -z depth\t limit callchain display depth" ); } /* * At exit handler for top mode */ void pmcstat_topexit(void) { if (!args.pa_toptty) return; /* * Shutdown ncurses. */ clrtoeol(); refresh(); endwin(); } +static inline void +libpmc_initialize(int *npmc) +{ + + if (libpmc_initialized) + return; + if (pmc_init() < 0) + err(EX_UNAVAILABLE, "ERROR: Initialization of the pmc(3)" + " library failed"); + + /* assume all CPUs are identical */ + if ((*npmc = pmc_npmc(0)) < 0) + err(EX_OSERR, "ERROR: Cannot determine the number of PMCs on " + "CPU %d", 0); + libpmc_initialized = true; +} /* * Main */ int main(int argc, char **argv) { - cpuset_t cpumask, rootmask; + cpuset_t cpumask, dommask, rootmask; double interval; double duration; int option, npmc; int c, check_driver_stats; int do_callchain, do_descendants, do_logproccsw, do_logprocexit; - int do_print, do_read, do_listcounters, do_descr; - int do_userspace; + int do_print, do_read, do_listcounters, do_descr, domains; + int do_userspace, i; size_t len; int graphdepth; int pipefd[2], rfd; int use_cumulative_counts; short cf, cb; uint64_t current_sampling_count; char *end, *tmp, *event; const char *errmsg, *graphfilename; enum pmcstat_state runstate; struct pmc_driverstats ds_start, ds_end; struct pmcstat_ev *ev; struct sigaction sa; struct kevent kev; struct winsize ws; struct stat sb; char buffer[PATH_MAX]; + uint32_t caps; check_driver_stats = 0; current_sampling_count = 0; do_callchain = 1; do_descr = 0; do_descendants = 0; do_userspace = 0; do_logproccsw = 0; do_logprocexit = 0; do_listcounters = 0; + domains = 0; use_cumulative_counts = 0; graphfilename = "-"; args.pa_required = 0; args.pa_flags = 0; args.pa_verbosity = 1; args.pa_logfd = -1; args.pa_fsroot = ""; args.pa_samplesdir = "."; args.pa_printfile = stderr; args.pa_graphdepth = DEFAULT_CALLGRAPH_DEPTH; args.pa_graphfile = NULL; args.pa_interval = DEFAULT_WAIT_INTERVAL; args.pa_mapfilename = NULL; args.pa_inputpath = NULL; args.pa_outputpath = NULL; args.pa_pplugin = PMCSTAT_PL_NONE; args.pa_plugin = PMCSTAT_PL_NONE; args.pa_ctdumpinstr = 1; args.pa_topmode = PMCSTAT_TOP_DELTA; args.pa_toptty = 0; args.pa_topcolor = 0; args.pa_mergepmc = 0; args.pa_duration = 0.0; STAILQ_INIT(&args.pa_events); SLIST_INIT(&args.pa_targets); bzero(&ds_start, sizeof(ds_start)); bzero(&ds_end, sizeof(ds_end)); ev = NULL; event = NULL; + caps = 0; CPU_ZERO(&cpumask); + /* Default to using the running system kernel. */ len = 0; if (sysctlbyname("kern.bootfile", NULL, &len, NULL, 0) == -1) err(EX_OSERR, "ERROR: Cannot determine path of running kernel"); args.pa_kernel = malloc(len); if (args.pa_kernel == NULL) errx(EX_SOFTWARE, "ERROR: Out of memory."); if (sysctlbyname("kern.bootfile", args.pa_kernel, &len, NULL, 0) == -1) err(EX_OSERR, "ERROR: Cannot determine path of running kernel"); + len = sizeof(domains); + if (sysctlbyname("vm.ndomains", &domains, &len, NULL, 0) == -1) + err(EX_OSERR, "ERROR: Cannot get number of domains"); /* * The initial CPU mask specifies the root mask of this process * which is usually all CPUs in the system. */ if (cpuset_getaffinity(CPU_LEVEL_ROOT, CPU_WHICH_PID, -1, sizeof(rootmask), &rootmask) == -1) err(EX_OSERR, "ERROR: Cannot determine the root set of CPUs"); CPU_COPY(&rootmask, &cpumask); while ((option = getopt(argc, argv, "ACD:EF:G:ILM:NO:P:R:S:TUWZa:c:def:gi:k:l:m:n:o:p:qr:s:t:u:vw:z:")) != -1) switch (option) { case 'A': args.pa_flags |= FLAG_SKIP_TOP_FN_RES; break; case 'a': /* Annotate + callgraph */ args.pa_flags |= FLAG_DO_ANNOTATE; args.pa_plugin = PMCSTAT_PL_ANNOTATE_CG; graphfilename = optarg; break; case 'C': /* cumulative values */ use_cumulative_counts = !use_cumulative_counts; args.pa_required |= FLAG_HAS_COUNTING_PMCS; break; case 'c': /* CPU */ if (optarg[0] == '*' && optarg[1] == '\0') CPU_COPY(&rootmask, &cpumask); else pmcstat_get_cpumask(optarg, &cpumask); args.pa_flags |= FLAGS_HAS_CPUMASK; args.pa_required |= FLAG_HAS_SYSTEM_PMCS; break; case 'D': if (stat(optarg, &sb) < 0) err(EX_OSERR, "ERROR: Cannot stat \"%s\"", optarg); if (!S_ISDIR(sb.st_mode)) errx(EX_USAGE, "ERROR: \"%s\" is not a directory.", optarg); args.pa_samplesdir = optarg; args.pa_flags |= FLAG_HAS_SAMPLESDIR; args.pa_required |= FLAG_DO_GPROF; break; case 'd': /* toggle descendents */ do_descendants = !do_descendants; args.pa_required |= FLAG_HAS_PROCESS_PMCS; break; case 'E': /* log process exit */ do_logprocexit = !do_logprocexit; args.pa_required |= (FLAG_HAS_PROCESS_PMCS | FLAG_HAS_COUNTING_PMCS | FLAG_HAS_OUTPUT_LOGFILE); break; case 'e': /* wide gprof metrics */ args.pa_flags |= FLAG_DO_WIDE_GPROF_HC; break; case 'F': /* produce a system-wide calltree */ args.pa_flags |= FLAG_DO_CALLGRAPHS; args.pa_plugin = PMCSTAT_PL_CALLTREE; graphfilename = optarg; break; case 'f': /* plugins options */ if (args.pa_plugin == PMCSTAT_PL_NONE) err(EX_USAGE, "ERROR: Need -g/-G/-m/-T."); pmcstat_pluginconfigure_log(optarg); break; case 'G': /* produce a system-wide callgraph */ args.pa_flags |= FLAG_DO_CALLGRAPHS; args.pa_plugin = PMCSTAT_PL_CALLGRAPH; graphfilename = optarg; break; case 'g': /* produce gprof compatible profiles */ args.pa_flags |= FLAG_DO_GPROF; args.pa_pplugin = PMCSTAT_PL_CALLGRAPH; args.pa_plugin = PMCSTAT_PL_GPROF; break; case 'i': args.pa_flags |= FLAG_FILTER_THREAD_ID; args.pa_tid = strtol(optarg, &end, 0); break; case 'I': args.pa_flags |= FLAG_SHOW_OFFSET; break; case 'k': /* pathname to the kernel */ free(args.pa_kernel); args.pa_kernel = strdup(optarg); if (args.pa_kernel == NULL) errx(EX_SOFTWARE, "ERROR: Out of memory"); args.pa_required |= FLAG_DO_ANALYSIS; args.pa_flags |= FLAG_HAS_KERNELPATH; break; case 'L': do_listcounters = 1; break; case 'l': /* time duration in seconds */ duration = strtod(optarg, &end); if (*end != '\0' || duration <= 0) errx(EX_USAGE, "ERROR: Illegal duration time " "value \"%s\".", optarg); args.pa_flags |= FLAG_HAS_DURATION; args.pa_duration = duration; break; case 'm': args.pa_flags |= FLAG_DO_ANNOTATE; args.pa_plugin = PMCSTAT_PL_ANNOTATE; graphfilename = optarg; break; case 'M': /* mapfile */ args.pa_mapfilename = optarg; break; case 'N': do_callchain = !do_callchain; args.pa_required |= FLAG_HAS_SAMPLING_PMCS; break; case 'p': /* process virtual counting PMC */ case 's': /* system-wide counting PMC */ case 'P': /* process virtual sampling PMC */ case 'S': /* system-wide sampling PMC */ + caps = 0; if ((ev = malloc(sizeof(*ev))) == NULL) errx(EX_SOFTWARE, "ERROR: Out of memory."); switch (option) { case 'p': ev->ev_mode = PMC_MODE_TC; break; case 's': ev->ev_mode = PMC_MODE_SC; break; case 'P': ev->ev_mode = PMC_MODE_TS; break; case 'S': ev->ev_mode = PMC_MODE_SS; break; } if (option == 'P' || option == 'p') { args.pa_flags |= FLAG_HAS_PROCESS_PMCS; args.pa_required |= (FLAG_HAS_COMMANDLINE | FLAG_HAS_TARGET); } if (option == 'P' || option == 'S') { args.pa_flags |= FLAG_HAS_SAMPLING_PMCS; args.pa_required |= (FLAG_HAS_PIPE | FLAG_HAS_OUTPUT_LOGFILE); } if (option == 'p' || option == 's') args.pa_flags |= FLAG_HAS_COUNTING_PMCS; if (option == 's' || option == 'S') args.pa_flags |= FLAG_HAS_SYSTEM_PMCS; ev->ev_spec = strdup(optarg); if (ev->ev_spec == NULL) errx(EX_SOFTWARE, "ERROR: Out of memory."); if (option == 'S' || option == 'P') ev->ev_count = current_sampling_count ? current_sampling_count : pmc_pmu_sample_rate_get(ev->ev_spec); else ev->ev_count = 0; if (option == 'S' || option == 's') ev->ev_cpu = CPU_FFS(&cpumask) - 1; else ev->ev_cpu = PMC_CPU_ANY; ev->ev_flags = 0; if (do_callchain) { ev->ev_flags |= PMC_F_CALLCHAIN; if (do_userspace) ev->ev_flags |= PMC_F_USERCALLCHAIN; } if (do_descendants) ev->ev_flags |= PMC_F_DESCENDANTS; if (do_logprocexit) ev->ev_flags |= PMC_F_LOG_PROCEXIT; if (do_logproccsw) ev->ev_flags |= PMC_F_LOG_PROCCSW; ev->ev_cumulative = use_cumulative_counts; ev->ev_saved = 0LL; ev->ev_pmcid = PMC_ID_INVALID; /* extract event name */ c = strcspn(optarg, ", \t"); ev->ev_name = malloc(c + 1); if (ev->ev_name == NULL) errx(EX_SOFTWARE, "ERROR: Out of memory."); (void) strncpy(ev->ev_name, optarg, c); *(ev->ev_name + c) = '\0'; + libpmc_initialize(&npmc); + if (args.pa_flags & FLAG_HAS_SYSTEM_PMCS) { + if (pmc_allocate(ev->ev_spec, ev->ev_mode, + ev->ev_flags, ev->ev_cpu, &ev->ev_pmcid, + ev->ev_count) < 0) + err(EX_OSERR, "ERROR: Cannot allocate " + "system-mode pmc with specification" + " \"%s\"", ev->ev_spec); + if (pmc_capabilities(ev->ev_pmcid, &caps)) { + pmc_release(ev->ev_pmcid); + err(EX_OSERR, "ERROR: Cannot get pmc " + "capabilities"); + } + } + STAILQ_INSERT_TAIL(&args.pa_events, ev, ev_next); + if ((caps & PMC_CAP_SYSWIDE) == PMC_CAP_SYSWIDE) + break; + if ((caps & PMC_CAP_DOMWIDE) == PMC_CAP_DOMWIDE) { + CPU_ZERO(&cpumask); + /* + * Get number of domains and allocate one + * counter in each. + * First already allocated. + */ + for (i = 1; i < domains; i++) { + CPU_ZERO(&dommask); + cpuset_getaffinity(CPU_LEVEL_WHICH, + CPU_WHICH_DOMAIN, i, sizeof(dommask), + &dommask); + CPU_SET(CPU_FFS(&dommask) - 1, &cpumask); + } + args.pa_flags |= FLAGS_HAS_CPUMASK; + } if (option == 's' || option == 'S') { CPU_CLR(ev->ev_cpu, &cpumask); + pmc_id_t saved_pmcid = ev->ev_pmcid; + ev->ev_pmcid = PMC_ID_INVALID; pmcstat_clone_event_descriptor(ev, &cpumask, &args); + ev->ev_pmcid = saved_pmcid; CPU_SET(ev->ev_cpu, &cpumask); } break; case 'n': /* sampling count */ current_sampling_count = strtol(optarg, &end, 0); if (*end != '\0' || current_sampling_count <= 0) errx(EX_USAGE, "ERROR: Illegal count value \"%s\".", optarg); args.pa_required |= FLAG_HAS_SAMPLING_PMCS; break; case 'o': /* outputfile */ if (args.pa_printfile != NULL && args.pa_printfile != stdout && args.pa_printfile != stderr) (void) fclose(args.pa_printfile); if ((args.pa_printfile = fopen(optarg, "w")) == NULL) errx(EX_OSERR, "ERROR: cannot open \"%s\" for writing.", optarg); args.pa_flags |= FLAG_DO_PRINT; break; case 'O': /* sampling output */ if (args.pa_outputpath) errx(EX_USAGE, "ERROR: option -O may only be specified once."); args.pa_outputpath = optarg; args.pa_flags |= FLAG_HAS_OUTPUT_LOGFILE; break; case 'q': /* quiet mode */ args.pa_verbosity = 0; break; case 'r': /* root FS path */ args.pa_fsroot = optarg; break; case 'R': /* read an existing log file */ if (args.pa_inputpath != NULL) errx(EX_USAGE, "ERROR: option -R may only be specified once."); args.pa_inputpath = optarg; if (args.pa_printfile == stderr) args.pa_printfile = stdout; args.pa_flags |= FLAG_READ_LOGFILE; break; case 't': /* target pid or process name */ pmcstat_find_targets(optarg); args.pa_flags |= FLAG_HAS_TARGET; args.pa_required |= FLAG_HAS_PROCESS_PMCS; break; case 'T': /* top mode */ args.pa_flags |= FLAG_DO_TOP; args.pa_plugin = PMCSTAT_PL_CALLGRAPH; args.pa_ctdumpinstr = 0; args.pa_mergepmc = 1; if (args.pa_printfile == stderr) args.pa_printfile = stdout; break; case 'u': do_descr = 1; event = optarg; break; case 'U': /* toggle user-space callchain capture */ do_userspace = !do_userspace; args.pa_required |= FLAG_HAS_SAMPLING_PMCS; break; case 'v': /* verbose */ args.pa_verbosity++; break; case 'w': /* wait interval */ interval = strtod(optarg, &end); if (*end != '\0' || interval <= 0) errx(EX_USAGE, "ERROR: Illegal wait interval value \"%s\".", optarg); args.pa_flags |= FLAG_HAS_WAIT_INTERVAL; args.pa_interval = interval; break; case 'W': /* toggle LOG_CSW */ do_logproccsw = !do_logproccsw; args.pa_required |= (FLAG_HAS_PROCESS_PMCS | FLAG_HAS_COUNTING_PMCS | FLAG_HAS_OUTPUT_LOGFILE); break; case 'z': graphdepth = strtod(optarg, &end); if (*end != '\0' || graphdepth <= 0) errx(EX_USAGE, "ERROR: Illegal callchain depth \"%s\".", optarg); args.pa_graphdepth = graphdepth; args.pa_required |= FLAG_DO_CALLGRAPHS; break; case '?': default: pmcstat_show_usage(); break; } if ((do_listcounters | do_descr) && pmc_pmu_enabled() == 0) errx(EX_USAGE, "pmu features not supported on host or hwpmc not loaded"); if (do_listcounters) { pmc_pmu_print_counters(NULL); } else if (do_descr) { pmc_pmu_print_counter_desc(event); } if (do_listcounters | do_descr) exit(0); args.pa_argc = (argc -= optind); args.pa_argv = (argv += optind); /* If we read from logfile and no specified CPU mask use * the maximum CPU count. */ if ((args.pa_flags & FLAG_READ_LOGFILE) && (args.pa_flags & FLAGS_HAS_CPUMASK) == 0) CPU_FILL(&cpumask); args.pa_cpumask = cpumask; /* For selecting CPUs using -R. */ if (argc) /* command line present */ args.pa_flags |= FLAG_HAS_COMMANDLINE; if (args.pa_flags & (FLAG_DO_GPROF | FLAG_DO_CALLGRAPHS | FLAG_DO_ANNOTATE | FLAG_DO_TOP)) args.pa_flags |= FLAG_DO_ANALYSIS; /* * Check invocation syntax. */ /* disallow -O and -R together */ if (args.pa_outputpath && args.pa_inputpath) errx(EX_USAGE, "ERROR: options -O and -R are mutually exclusive."); /* disallow -T and -l together */ if ((args.pa_flags & FLAG_HAS_DURATION) && (args.pa_flags & FLAG_DO_TOP)) errx(EX_USAGE, "ERROR: options -T and -l are mutually " "exclusive."); /* -a and -m require -R */ if (args.pa_flags & FLAG_DO_ANNOTATE && args.pa_inputpath == NULL) errx(EX_USAGE, "ERROR: option %s requires an input file", args.pa_plugin == PMCSTAT_PL_ANNOTATE ? "-m" : "-a"); /* -m option is not allowed combined with -g or -G. */ if (args.pa_flags & FLAG_DO_ANNOTATE && args.pa_flags & (FLAG_DO_GPROF | FLAG_DO_CALLGRAPHS)) errx(EX_USAGE, "ERROR: option -m and -g | -G are mutually exclusive"); if (args.pa_flags & FLAG_READ_LOGFILE) { errmsg = NULL; if (args.pa_flags & FLAG_HAS_COMMANDLINE) errmsg = "a command line specification"; else if (args.pa_flags & FLAG_HAS_TARGET) errmsg = "option -t"; else if (!STAILQ_EMPTY(&args.pa_events)) errmsg = "a PMC event specification"; if (errmsg) errx(EX_USAGE, "ERROR: option -R may not be used with %s.", errmsg); } else if (STAILQ_EMPTY(&args.pa_events)) /* All other uses require a PMC spec. */ pmcstat_show_usage(); /* check for -t pid without a process PMC spec */ if ((args.pa_flags & FLAG_HAS_TARGET) && (args.pa_required & FLAG_HAS_PROCESS_PMCS) && (args.pa_flags & FLAG_HAS_PROCESS_PMCS) == 0) errx(EX_USAGE, "ERROR: option -t requires a process mode PMC to be specified." ); /* check for process-mode options without a command or -t pid */ if ((args.pa_required & FLAG_HAS_PROCESS_PMCS) && (args.pa_flags & (FLAG_HAS_COMMANDLINE | FLAG_HAS_TARGET)) == 0) errx(EX_USAGE, "ERROR: options -d, -E, -p, -P, and -W require a command line or target process." ); /* check for -p | -P without a target process of some sort */ if ((args.pa_required & (FLAG_HAS_COMMANDLINE | FLAG_HAS_TARGET)) && (args.pa_flags & (FLAG_HAS_COMMANDLINE | FLAG_HAS_TARGET)) == 0) errx(EX_USAGE, "ERROR: options -P and -p require a target process or a command line." ); /* check for process-mode options without a process-mode PMC */ if ((args.pa_required & FLAG_HAS_PROCESS_PMCS) && (args.pa_flags & FLAG_HAS_PROCESS_PMCS) == 0) errx(EX_USAGE, "ERROR: options -d, -E, and -W require a process mode PMC to be specified." ); /* check for -c cpu with no system mode PMCs or logfile. */ if ((args.pa_required & FLAG_HAS_SYSTEM_PMCS) && (args.pa_flags & FLAG_HAS_SYSTEM_PMCS) == 0 && (args.pa_flags & FLAG_READ_LOGFILE) == 0) errx(EX_USAGE, "ERROR: option -c requires at least one system mode PMC to be specified." ); /* check for counting mode options without a counting PMC */ if ((args.pa_required & FLAG_HAS_COUNTING_PMCS) && (args.pa_flags & FLAG_HAS_COUNTING_PMCS) == 0) errx(EX_USAGE, "ERROR: options -C, -W and -o require at least one counting mode PMC to be specified." ); /* check for sampling mode options without a sampling PMC spec */ if ((args.pa_required & FLAG_HAS_SAMPLING_PMCS) && (args.pa_flags & FLAG_HAS_SAMPLING_PMCS) == 0) errx(EX_USAGE, "ERROR: options -N, -n and -O require at least one sampling mode PMC to be specified." ); /* check if -g/-G/-m/-T are being used correctly */ if ((args.pa_flags & FLAG_DO_ANALYSIS) && !(args.pa_flags & (FLAG_HAS_SAMPLING_PMCS|FLAG_READ_LOGFILE))) errx(EX_USAGE, "ERROR: options -g/-G/-m/-T require sampling PMCs or -R to be specified." ); /* check if -e was specified without -g */ if ((args.pa_flags & FLAG_DO_WIDE_GPROF_HC) && !(args.pa_flags & FLAG_DO_GPROF)) errx(EX_USAGE, "ERROR: option -e requires gprof mode to be specified." ); /* check if -O was spuriously specified */ if ((args.pa_flags & FLAG_HAS_OUTPUT_LOGFILE) && (args.pa_required & FLAG_HAS_OUTPUT_LOGFILE) == 0) errx(EX_USAGE, "ERROR: option -O is used only with options -E, -P, -S and -W." ); /* -k kernel path require -g/-G/-m/-T or -R */ if ((args.pa_flags & FLAG_HAS_KERNELPATH) && (args.pa_flags & FLAG_DO_ANALYSIS) == 0 && (args.pa_flags & FLAG_READ_LOGFILE) == 0) errx(EX_USAGE, "ERROR: option -k is only used with -g/-R/-m/-T."); /* -D only applies to gprof output mode (-g) */ if ((args.pa_flags & FLAG_HAS_SAMPLESDIR) && (args.pa_flags & FLAG_DO_GPROF) == 0) errx(EX_USAGE, "ERROR: option -D is only used with -g."); /* -M mapfile requires -g or -R */ if (args.pa_mapfilename != NULL && (args.pa_flags & FLAG_DO_GPROF) == 0 && (args.pa_flags & FLAG_READ_LOGFILE) == 0) errx(EX_USAGE, "ERROR: option -M is only used with -g/-R."); /* * Disallow textual output of sampling PMCs if counting PMCs * have also been asked for, mostly because the combined output * is difficult to make sense of. */ if ((args.pa_flags & FLAG_HAS_COUNTING_PMCS) && (args.pa_flags & FLAG_HAS_SAMPLING_PMCS) && ((args.pa_flags & FLAG_HAS_OUTPUT_LOGFILE) == 0)) errx(EX_USAGE, "ERROR: option -O is required if counting and sampling PMCs are specified together." ); /* * Check if 'kerneldir' refers to a file rather than a * directory. If so, use `dirname path` to determine the * kernel directory. */ (void) snprintf(buffer, sizeof(buffer), "%s%s", args.pa_fsroot, args.pa_kernel); if (stat(buffer, &sb) < 0) err(EX_OSERR, "ERROR: Cannot locate kernel \"%s\"", buffer); if (!S_ISREG(sb.st_mode) && !S_ISDIR(sb.st_mode)) errx(EX_USAGE, "ERROR: \"%s\": Unsupported file type.", buffer); if (!S_ISDIR(sb.st_mode)) { tmp = args.pa_kernel; args.pa_kernel = strdup(dirname(args.pa_kernel)); if (args.pa_kernel == NULL) errx(EX_SOFTWARE, "ERROR: Out of memory"); free(tmp); (void) snprintf(buffer, sizeof(buffer), "%s%s", args.pa_fsroot, args.pa_kernel); if (stat(buffer, &sb) < 0) err(EX_OSERR, "ERROR: Cannot stat \"%s\"", buffer); if (!S_ISDIR(sb.st_mode)) errx(EX_USAGE, "ERROR: \"%s\" is not a directory.", buffer); } /* * If we have a callgraph be created, select the outputfile. */ if (args.pa_flags & FLAG_DO_CALLGRAPHS) { if (strcmp(graphfilename, "-") == 0) args.pa_graphfile = args.pa_printfile; else { args.pa_graphfile = fopen(graphfilename, "w"); if (args.pa_graphfile == NULL) err(EX_OSERR, "ERROR: cannot open \"%s\" for writing", graphfilename); } } if (args.pa_flags & FLAG_DO_ANNOTATE) { args.pa_graphfile = fopen(graphfilename, "w"); if (args.pa_graphfile == NULL) err(EX_OSERR, "ERROR: cannot open \"%s\" for writing", graphfilename); } /* if we've been asked to process a log file, skip init */ - if ((args.pa_flags & FLAG_READ_LOGFILE) == 0) { - if (pmc_init() < 0) - err(EX_UNAVAILABLE, - "ERROR: Initialization of the pmc(3) library failed" - ); - - if ((npmc = pmc_npmc(0)) < 0) /* assume all CPUs are identical */ - err(EX_OSERR, -"ERROR: Cannot determine the number of PMCs on CPU %d", - 0); - } + if ((args.pa_flags & FLAG_READ_LOGFILE) == 0) + libpmc_initialize(&npmc); /* Allocate a kqueue */ if ((pmcstat_kq = kqueue()) < 0) err(EX_OSERR, "ERROR: Cannot allocate kqueue"); /* Setup the logfile as the source. */ if (args.pa_flags & FLAG_READ_LOGFILE) { /* * Print the log in textual form if we haven't been * asked to generate profiling information. */ if ((args.pa_flags & FLAG_DO_ANALYSIS) == 0) args.pa_flags |= FLAG_DO_PRINT; pmcstat_log_initialize_logging(); rfd = pmcstat_open_log(args.pa_inputpath, PMCSTAT_OPEN_FOR_READ); if ((args.pa_logparser = pmclog_open(rfd)) == NULL) err(EX_OSERR, "ERROR: Cannot create parser"); if (fcntl(rfd, F_SETFL, O_NONBLOCK) < 0) err(EX_OSERR, "ERROR: fcntl(2) failed"); EV_SET(&kev, rfd, EVFILT_READ, EV_ADD, 0, 0, NULL); if (kevent(pmcstat_kq, &kev, 1, NULL, 0, NULL) < 0) err(EX_OSERR, "ERROR: Cannot register kevent"); } /* * Configure the specified log file or setup a default log * consumer via a pipe. */ if (args.pa_required & FLAG_HAS_OUTPUT_LOGFILE) { if (args.pa_outputpath) args.pa_logfd = pmcstat_open_log(args.pa_outputpath, PMCSTAT_OPEN_FOR_WRITE); else { /* * process the log on the fly by reading it in * through a pipe. */ if (pipe(pipefd) < 0) err(EX_OSERR, "ERROR: pipe(2) failed"); if (fcntl(pipefd[READPIPEFD], F_SETFL, O_NONBLOCK) < 0) err(EX_OSERR, "ERROR: fcntl(2) failed"); EV_SET(&kev, pipefd[READPIPEFD], EVFILT_READ, EV_ADD, 0, 0, NULL); if (kevent(pmcstat_kq, &kev, 1, NULL, 0, NULL) < 0) err(EX_OSERR, "ERROR: Cannot register kevent"); args.pa_logfd = pipefd[WRITEPIPEFD]; args.pa_flags |= FLAG_HAS_PIPE; if ((args.pa_flags & FLAG_DO_TOP) == 0) args.pa_flags |= FLAG_DO_PRINT; args.pa_logparser = pmclog_open(pipefd[READPIPEFD]); } if (pmc_configure_logfile(args.pa_logfd) < 0) err(EX_OSERR, "ERROR: Cannot configure log file"); } /* remember to check for driver errors if we are sampling or logging */ check_driver_stats = (args.pa_flags & FLAG_HAS_SAMPLING_PMCS) || (args.pa_flags & FLAG_HAS_OUTPUT_LOGFILE); /* if (args.pa_flags & FLAG_READ_LOGFILE) { * Allocate PMCs. */ STAILQ_FOREACH(ev, &args.pa_events, ev_next) { - if (pmc_allocate(ev->ev_spec, ev->ev_mode, + if (ev->ev_pmcid == PMC_ID_INVALID && + pmc_allocate(ev->ev_spec, ev->ev_mode, ev->ev_flags, ev->ev_cpu, &ev->ev_pmcid, ev->ev_count) < 0) err(EX_OSERR, "ERROR: Cannot allocate %s-mode pmc with specification \"%s\"", PMC_IS_SYSTEM_MODE(ev->ev_mode) ? "system" : "process", ev->ev_spec); if (PMC_IS_SAMPLING_MODE(ev->ev_mode) && pmc_set(ev->ev_pmcid, ev->ev_count) < 0) err(EX_OSERR, "ERROR: Cannot set sampling count for PMC \"%s\"", ev->ev_name); } /* compute printout widths */ STAILQ_FOREACH(ev, &args.pa_events, ev_next) { int counter_width; int display_width; int header_width; (void) pmc_width(ev->ev_pmcid, &counter_width); header_width = strlen(ev->ev_name) + 2; /* prefix '%c/' */ display_width = (int) floor(counter_width / 3.32193) + 1; if (PMC_IS_SYSTEM_MODE(ev->ev_mode)) header_width += 3; /* 2 digit CPU number + '/' */ if (header_width > display_width) { ev->ev_fieldskip = 0; ev->ev_fieldwidth = header_width; } else { ev->ev_fieldskip = display_width - header_width; ev->ev_fieldwidth = display_width; } } /* * If our output is being set to a terminal, register a handler * for window size changes. */ if (isatty(fileno(args.pa_printfile))) { if (ioctl(fileno(args.pa_printfile), TIOCGWINSZ, &ws) < 0) err(EX_OSERR, "ERROR: Cannot determine window size"); pmcstat_displayheight = ws.ws_row - 1; pmcstat_displaywidth = ws.ws_col - 1; EV_SET(&kev, SIGWINCH, EVFILT_SIGNAL, EV_ADD, 0, 0, NULL); if (kevent(pmcstat_kq, &kev, 1, NULL, 0, NULL) < 0) err(EX_OSERR, "ERROR: Cannot register kevent for SIGWINCH"); args.pa_toptty = 1; } /* * Listen to key input in top mode. */ if (args.pa_flags & FLAG_DO_TOP) { EV_SET(&kev, fileno(stdin), EVFILT_READ, EV_ADD, 0, 0, NULL); if (kevent(pmcstat_kq, &kev, 1, NULL, 0, NULL) < 0) err(EX_OSERR, "ERROR: Cannot register kevent"); } EV_SET(&kev, SIGINT, EVFILT_SIGNAL, EV_ADD, 0, 0, NULL); if (kevent(pmcstat_kq, &kev, 1, NULL, 0, NULL) < 0) err(EX_OSERR, "ERROR: Cannot register kevent for SIGINT"); EV_SET(&kev, SIGIO, EVFILT_SIGNAL, EV_ADD, 0, 0, NULL); if (kevent(pmcstat_kq, &kev, 1, NULL, 0, NULL) < 0) err(EX_OSERR, "ERROR: Cannot register kevent for SIGIO"); /* * An exec() failure of a forked child is signalled by the * child sending the parent a SIGCHLD. We don't register an * actual signal handler for SIGCHLD, but instead use our * kqueue to pick up the signal. */ EV_SET(&kev, SIGCHLD, EVFILT_SIGNAL, EV_ADD, 0, 0, NULL); if (kevent(pmcstat_kq, &kev, 1, NULL, 0, NULL) < 0) err(EX_OSERR, "ERROR: Cannot register kevent for SIGCHLD"); /* * Setup a timer if we have counting mode PMCs needing to be printed or * top mode plugin is active. */ if (((args.pa_flags & FLAG_HAS_COUNTING_PMCS) && (args.pa_required & FLAG_HAS_OUTPUT_LOGFILE) == 0) || (args.pa_flags & FLAG_DO_TOP)) { EV_SET(&kev, 0, EVFILT_TIMER, EV_ADD, 0, args.pa_interval * 1000, NULL); if (kevent(pmcstat_kq, &kev, 1, NULL, 0, NULL) < 0) err(EX_OSERR, "ERROR: Cannot register kevent for timer"); } /* * Setup a duration timer if we have sampling mode PMCs and * a duration time is set */ if ((args.pa_flags & FLAG_HAS_SAMPLING_PMCS) && (args.pa_flags & FLAG_HAS_DURATION)) { EV_SET(&kev, 0, EVFILT_TIMER, EV_ADD, 0, args.pa_duration * 1000, NULL); if (kevent(pmcstat_kq, &kev, 1, NULL, 0, NULL) < 0) err(EX_OSERR, "ERROR: Cannot register kevent for " "time duration"); } /* attach PMCs to the target process, starting it if specified */ if (args.pa_flags & FLAG_HAS_COMMANDLINE) pmcstat_create_process(pmcstat_sockpair, &args, pmcstat_kq); if (check_driver_stats && pmc_get_driver_stats(&ds_start) < 0) err(EX_OSERR, "ERROR: Cannot retrieve driver statistics"); /* Attach process pmcs to the target process. */ if (args.pa_flags & (FLAG_HAS_TARGET | FLAG_HAS_COMMANDLINE)) { if (SLIST_EMPTY(&args.pa_targets)) errx(EX_DATAERR, "ERROR: No matching target processes."); if (args.pa_flags & FLAG_HAS_PROCESS_PMCS) pmcstat_attach_pmcs(&args); if (pmcstat_kvm) { kvm_close(pmcstat_kvm); pmcstat_kvm = NULL; } } /* start the pmcs */ pmcstat_start_pmcs(); /* start the (commandline) process if needed */ if (args.pa_flags & FLAG_HAS_COMMANDLINE) pmcstat_start_process(pmcstat_sockpair); /* initialize logging */ pmcstat_log_initialize_logging(); /* Handle SIGINT using the kqueue loop */ sa.sa_handler = SIG_IGN; sa.sa_flags = 0; (void) sigemptyset(&sa.sa_mask); if (sigaction(SIGINT, &sa, NULL) < 0) err(EX_OSERR, "ERROR: Cannot install signal handler"); /* * Setup the top mode display. */ if (args.pa_flags & FLAG_DO_TOP) { args.pa_flags &= ~FLAG_DO_PRINT; if (args.pa_toptty) { /* * Init ncurses. */ initscr(); if(has_colors() == TRUE) { args.pa_topcolor = 1; start_color(); use_default_colors(); pair_content(0, &cf, &cb); init_pair(1, COLOR_RED, cb); init_pair(2, COLOR_YELLOW, cb); init_pair(3, COLOR_GREEN, cb); } cbreak(); noecho(); nonl(); nodelay(stdscr, 1); intrflush(stdscr, FALSE); keypad(stdscr, TRUE); clear(); /* Get terminal width / height with ncurses. */ getmaxyx(stdscr, pmcstat_displayheight, pmcstat_displaywidth); pmcstat_displayheight--; pmcstat_displaywidth--; atexit(pmcstat_topexit); } } /* * loop till either the target process (if any) exits, or we * are killed by a SIGINT or we reached the time duration. */ runstate = PMCSTAT_RUNNING; do_print = do_read = 0; do { if ((c = kevent(pmcstat_kq, NULL, 0, &kev, 1, NULL)) <= 0) { if (errno != EINTR) err(EX_OSERR, "ERROR: kevent failed"); else continue; } if (kev.flags & EV_ERROR) errc(EX_OSERR, kev.data, "ERROR: kevent failed"); switch (kev.filter) { case EVFILT_PROC: /* target has exited */ runstate = pmcstat_close_log(&args); do_print = 1; break; case EVFILT_READ: /* log file data is present */ if (kev.ident == (unsigned)fileno(stdin) && (args.pa_flags & FLAG_DO_TOP)) { if (pmcstat_keypress_log()) runstate = pmcstat_close_log(&args); } else { do_read = 0; runstate = pmcstat_process_log(); } break; case EVFILT_SIGNAL: if (kev.ident == SIGCHLD) { /* * The child process sends us a * SIGCHLD if its exec() failed. We * wait for it to exit and then exit * ourselves. */ (void) wait(&c); runstate = PMCSTAT_FINISHED; } else if (kev.ident == SIGIO) { /* * We get a SIGIO if a PMC loses all * of its targets, or if logfile * writes encounter an error. */ runstate = pmcstat_close_log(&args); do_print = 1; /* print PMCs at exit */ } else if (kev.ident == SIGINT) { /* Kill the child process if we started it */ if (args.pa_flags & FLAG_HAS_COMMANDLINE) pmcstat_kill_process(); runstate = pmcstat_close_log(&args); } else if (kev.ident == SIGWINCH) { if (ioctl(fileno(args.pa_printfile), TIOCGWINSZ, &ws) < 0) err(EX_OSERR, "ERROR: Cannot determine window size"); pmcstat_displayheight = ws.ws_row - 1; pmcstat_displaywidth = ws.ws_col - 1; } else assert(0); break; case EVFILT_TIMER: /* time duration reached, exit */ if (args.pa_flags & FLAG_HAS_DURATION) { runstate = PMCSTAT_FINISHED; break; } /* print out counting PMCs */ if ((args.pa_flags & FLAG_DO_TOP) && (args.pa_flags & FLAG_HAS_PIPE) && pmc_flush_logfile() == 0) do_read = 1; do_print = 1; break; } if (do_print && !do_read) { if ((args.pa_required & FLAG_HAS_OUTPUT_LOGFILE) == 0) { pmcstat_print_pmcs(); if (runstate == PMCSTAT_FINISHED && /* final newline */ (args.pa_flags & FLAG_DO_PRINT) == 0) (void) fprintf(args.pa_printfile, "\n"); } if (args.pa_flags & FLAG_DO_TOP) pmcstat_display_log(); do_print = 0; } } while (runstate != PMCSTAT_FINISHED); if ((args.pa_flags & FLAG_DO_TOP) && args.pa_toptty) { pmcstat_topexit(); args.pa_toptty = 0; } /* flush any pending log entries */ if (args.pa_flags & (FLAG_HAS_OUTPUT_LOGFILE | FLAG_HAS_PIPE)) pmc_close_logfile(); pmcstat_cleanup(); /* check if the driver lost any samples or events */ if (check_driver_stats) { if (pmc_get_driver_stats(&ds_end) < 0) err(EX_OSERR, "ERROR: Cannot retrieve driver statistics"); if (ds_start.pm_intr_bufferfull != ds_end.pm_intr_bufferfull && args.pa_verbosity > 0) warnx( "WARNING: sampling was paused at least %u time%s.\n" "Please consider tuning the \"kern.hwpmc.nsamples\" tunable.", ds_end.pm_intr_bufferfull - ds_start.pm_intr_bufferfull, ((ds_end.pm_intr_bufferfull - ds_start.pm_intr_bufferfull) != 1) ? "s" : "" ); if (ds_start.pm_buffer_requests_failed != ds_end.pm_buffer_requests_failed && args.pa_verbosity > 0) warnx( "WARNING: at least %u event%s were discarded while running.\n" "Please consider tuning the \"kern.hwpmc.nbuffers_pcpu\" tunable.", ds_end.pm_buffer_requests_failed - ds_start.pm_buffer_requests_failed, ((ds_end.pm_buffer_requests_failed - ds_start.pm_buffer_requests_failed) != 1) ? "s" : "" ); } exit(EX_OK); }