This isn't functional but simply serves to evaluate the performance The idea here is to avoid a memory access and conditional branch per
overhead of an asm-goto-based approach.
The idea here is to avoid a memory access and conditional branch. probe site. Instead, the probe is represented by an "unreachable"
Instead, the probe is represented by an "unreachable" unconditional unconditional function call. asm goto is used to store the address of
function call. asm goto is used to store the probe site (represented by a no-op sled) and the address of the probe sitee
(represented by a no-op sled) and the address of the function call.
When the probe is enabled, the no-op sled is overwritten with a jmp to function call into a tracepoint record. Each SDT probe carries a list
the label.
of tracepoints.
When the probe is enabled, the no-op sled corresponding to each
tracepoint is overwritten with a jmp to the corresponding label. The
implementation uses smp_rendezvous() to park all other CPUs while the
instruction is being overwritten, as this can't be done atomically in
general.
I verified that llvm 17 moves argument marshalling code and the
sdt_probe() function call out-of-line, i.e., to the end of the function.
Per gallatin@ in D43504, this approach has less overhead when probes are
disabled. To make the implementation simpler, I removed support for
probes with 7 arguments; nothing makes use of this except a regression
test case. I also didn't implement this for 32-bit powerpc since I
wasn't able to figure out how to boot it in QEMU.
I have a couple of follow-up patches which take this further:
1. We can now fill out the "function" field of SDT probe names
automatically, since we know exactly where each tracepoint is
located.
2. We can put additional code between the asm goto target label and the
probe itself. This lets us perform some probe-specific argument
marshalling without any overhead when the probe is disabled. For
example:
```
if (SDT_PROBES_ENABLED()) {
int reason = CLD_EXITED;
if (WCOREDUMP(signo))
reason = CLD_DUMPED;
else if (WIFSIGNALED(signo))
reason = CLD_KILLED;
SDT_PROBE1(proc, , , exit, reason);
}
```
becomes
```
SDT_PROBE1_EXP(proc, , , exit, reason,
int reason;
reason = CLD_EXITED;
if (WCOREDUMP(signo))
reason = CLD_DUMPED;
else if (WIFSIGNALED(signo))
reason = CLD_KILLED;
);
```
In the future I would like to use this mechanism more generally, e.g.,
to remove branches and marshalling code used by hwpmc, and generally to
make it easier to add new tracepoint consumers without having to add
more conditional branches to hot code paths.