Using per-CPU and per-thread trampolines is expensive and error-prone,
since we're rewriting the same memory blocks constantly. Per-probe
trampolines solve this problem by giving each probe its own block of
executable memory, which more or less remains the same after the initial
write.
What this patch does, is get rid of the initialization code which
allocates a trampoline for each thread, and instead let each port of
kinst allocate a trampoline for each new probe created. It also sets up
the infrastructure needed to support the new trampoline scheme.
This change is not currently supported on amd64, as the amd64 port needs
further changes to work, so this is a temporary/gradual patch to fix the
riscv and arm64 ports.