Change Details

The idea here is to avoid a memory access and conditional branch per probe site. Instead, the probe is represented by an "unreachable" unconditional function call. asm goto is used to store the address of the probe site (represented by a no-op sled) and the address of the function call into a tracepoint record. Each SDT probe carries a list of tracepoints. When the probe is enabled, the no-op sled corresponding to each tracepoint is overwritten with a jmp to the corresponding label. The implementation uses smp_rendezvous() to park all other CPUs while the instruction is being overwritten, as this can't be done atomically in general. I verified that llvm 17 moves argument marshalling code and the sdt_probe() function call out-of-line, i.e., to the end of the function. Per gallatin@ in D43504, this approach has less overhead when probes are disabled. To make the implementation simpler, I removed support for probes with 7 arguments; nothing makes use of this except a regression test case. I also didn't implement this for 32-bit powerpc since I wasn't able to figure out how to boot it in QEMU. I have a couple of follow-up patches which take this further: 1. We can now fill out the "function" field of SDT probe names automatically, since we know exactly where each tracepoint is located. 2. We can put additional code between the asm goto target label and the probe itself. This lets us perform some probe-specific argument marshalling without any overhead when the probe is disabled. For example: ``` if (SDT_PROBES_ENABLED()) { int reason = CLD_EXITED; if (WCOREDUMP(signo)) reason = CLD_DUMPED; else if (WIFSIGNALED(signo)) reason = CLD_KILLED; SDT_PROBE1(proc, , , exit, reason); } ``` becomes ``` SDT_PROBE1_EXP(proc, , , exit, reason, int reason; reason = CLD_EXITED; if (WCOREDUMP(signo)) reason = CLD_DUMPED; else if (WIFSIGNALED(signo)) reason = CLD_KILLED; ); ``` In the future I would like to use this mechanism more generally, e.g., to remove branches and marshalling code used by hwpmc, and generally to make it easier to add new tracepoint consumers without having to add more conditional branches to hot code paths.

This isn't functional but simply serves to evaluate the performance The idea here is to avoid a memory access and conditional branch per overhead of an asm-goto-based approach. The idea here is to avoid a memory access and conditional branch. probe site. Instead, the probe is represented by an "unreachable" Instead, the probe is represented by an "unreachable" unconditional unconditional function call. asm goto is used to store the address of function call. asm goto is used to store the probe site (represented by a no-op sled) and the address of the probe sitee (represented by a no-op sled) and the address of the function call. When the probe is enabled, the no-op sled is overwritten with a jmp to function call into a tracepoint record. Each SDT probe carries a list the label. of tracepoints. When the probe is enabled, the no-op sled corresponding to each tracepoint is overwritten with a jmp to the corresponding label. The implementation uses smp_rendezvous() to park all other CPUs while the instruction is being overwritten, as this can't be done atomically in general. I verified that llvm 17 moves argument marshalling code and the sdt_probe() function call out-of-line, i.e., to the end of the function. Per gallatin@ in D43504, this approach has less overhead when probes are disabled. To make the implementation simpler, I removed support for probes with 7 arguments; nothing makes use of this except a regression test case. I also didn't implement this for 32-bit powerpc since I wasn't able to figure out how to boot it in QEMU. I have a couple of follow-up patches which take this further: 1. We can now fill out the "function" field of SDT probe names automatically, since we know exactly where each tracepoint is located. 2. We can put additional code between the asm goto target label and the probe itself. This lets us perform some probe-specific argument marshalling without any overhead when the probe is disabled. For example: ``` if (SDT_PROBES_ENABLED()) { int reason = CLD_EXITED; if (WCOREDUMP(signo)) reason = CLD_DUMPED; else if (WIFSIGNALED(signo)) reason = CLD_KILLED; SDT_PROBE1(proc, , , exit, reason); } ``` becomes ``` SDT_PROBE1_EXP(proc, , , exit, reason, int reason; reason = CLD_EXITED; if (WCOREDUMP(signo)) reason = CLD_DUMPED; else if (WIFSIGNALED(signo)) reason = CLD_KILLED; ); ``` In the future I would like to use this mechanism more generally, e.g., to remove branches and marshalling code used by hwpmc, and generally to make it easier to add new tracepoint consumers without having to add more conditional branches to hot code paths.