Paths

Table of Contentst

Differential D19273

Fix a hash table lookup race in fasttrap_pid_probe().
ClosedPublic
Actions

Authored by markj on Feb 20 2019, 9:26 PM.

Details

Reviewers

cem
gnn

Group Reviewers

DTrace

Commits

rS344452: Fix a tracepoint lookup race in fasttrap_pid_probe().

Summary

The kernel component of userland dtrace, fasttrap, hooks the breakpoint
handler and uses a hash table, keyed by PID and PC, to look up the probe
structure for a given tracepoint. The hash table lookup is subject to
the following race:

Thread 1:
Hits a userspace breakpoint, traps into the kernel.
Enables interrupts.

Thread 2:
Uninstalls the breakpoint from thread 1's address space.
Removes the tracepoint structure from the hash table.

Thread 1:
Hash table lookup fails.

In this case, thread 1 concludes that something else caused the
breakpoint trap. Since it is not being traced anymore, i.e., there's no
debugger to forward the trap to, the kernel sends SIGTRAP to thread 1,
killing it.

illumos solves this problem by rereading the trapping instruction upon a
hash table lookup failure. If that instruction doesn't look like it
should have caused a breakpoint, the breakpoint is "consumed" and the
thread is set up to retry the instruction. I don't really like that
solution: it's machine dependent and not forwards-compatible. That is,
if Intel ever adds some new mechanism for raising #BP and fasttrap
doesn't know about it, it could cause the target thread to enter an
infinite loop.

Another solution is to have dtrace(1) stop the target process, remove
tracepoints from its address space, and resume the process before
removing all probes. I started implementing this, but it is
complicated: a new ioctl is needed to remove tracepoints, there are lots
of error cases, etc..

Solve the race using a per-process generation counter which gets updated
before a tracepoint is removed from the hash table, and after it is
removed from the target process' vmspace. When thread 1's lookup fails,
it compares its copy of the counter with the process' value and retries
the #BP instruction if they differ. If the breakpoint was caused by a
vanishing fasttrap tracepoint, this ensures that we will retry the
instruction rather than sending SIGTRAP. If the breakpoint was not
caused by DTrace, fasttrap will consume the breakpoint (and retry) at
most once so long as dtrace(1) is not actively removing probes. If it
is removing probes, then it has just ptrace(PT_DETACH)ed the target
process, so it is quite unlikely that a debugger has attached and
installed breakpoints of its own in that small window.

Test Plan

Test suite.

It's possible to reproduce this by, for example, calling strlen()
in a loop, probing every instruction in strlen(), and killing dtrace(1).
With this patch applied I wasn't able to trigger any problems.

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 22627
Build 21738: arc lint + arc unit

Event Timeline

markj created this revision.Feb 20 2019, 9:26 PM

Harbormaster completed remote builds in B22627: Diff 54153.Feb 20 2019, 9:26 PM

markj edited the test plan for this revision. (Show Details)Feb 20 2019, 9:39 PM

markj added a reviewer: DTrace.

I wasn't careful about preserving #ifdefs here: fasttrap has been virtually untouched upstream for the past 5 years, so once this commit is in I'm going to get rid of illumos ifdefs there in an attempt to make the code a bit easier to read.

cem added a reviewer: cem.Feb 20 2019, 11:11 PM

In D19273#412458, @markj wrote:

I wasn't careful about preserving #ifdefs here: fasttrap has been virtually untouched upstream for the past 5 years, so once this commit is in I'm going to get rid of illumos ifdefs there in an attempt to make the code a bit easier to read.

Sounds good to me.

Solve the race using a per-process generation counter which gets updated
before a tracepoint is removed from the hash table, and after it is
removed from the target process' vmspace.

We do the latter, but I think we're missing the former.

sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c
201	The global appears to be unused?
sys/cddl/contrib/opensolaris/uts/intel/dtrace/fasttrap_isa.c
1011–1030	I'm not sure this assumption expressed in the (probably Illumos) comment holds for FreeBSD. For example, I think rfork(2) can create shared a vmspace child process that runs at the same time as its parent, e.g., `RFMEM \| RFPROC` and `rfork_thread(3)`. The traversal of `p_pptr` with `proctree_lock` held is fine, but the assumption that `pp` will remain valid after dropping that lock maybe isn't. Perhaps PHOLD() is acceptable? In that scenario, I think `pp` can exit after we drop `proctree_lock` but before we check `p_fasttrap_tp_gen`. We're not going to crash, since `proc` is type stable, and I think it's (?)somewhat unlikely we'll get a false positive. But the possibility exists, I guess. (We could avoid the PHOLD if we could safely check that the parent process was suspended or only running this thread? Or perhaps we could just go ahead and use the cached `p_pid` as a heuristic.)
1057–1058	Note that neither of these fields is within `startzero`/`endzero` nor `startcopy`/`endcopy`; they're never necessarily initialized to a good value, AFAICT.

This revision now requires changes to proceed.Feb 21 2019, 2:01 AM

Remove global.
Make sure that the parent doesn't go away during the lookup.

Harbormaster completed remote builds in B22635: Diff 54172.Feb 21 2019, 3:05 AM

markj added inline comments.Feb 21 2019, 3:05 AM

sys/cddl/contrib/opensolaris/uts/intel/dtrace/fasttrap_isa.c
1011–1030	This code is all pretty broken in such a scenario. For instance, the parent may already have exited and reparented the child by this point, so the tracepoint lookup would certainly fail. Or, the probes may have been created in the child rather than the parent, so the parent will hit them but never find them in the table. I think using PHOLD() is fine. We only need it if the process shares its vmspace with its parent, which is hopefully a rare scenario. We could do something like abort (return -1) if the process and its parent share a vmspace and P_PPWAIT is not set in the child, but that seems more fragile.
1057–1058	They are allocated separately and initialized to 0. See kdtrace_thread_ctor().

In D19273#412540, @cem wrote:

In D19273#412458, @markj wrote:

Solve the race using a per-process generation counter which gets updated
before a tracepoint is removed from the hash table, and after it is
removed from the target process' vmspace.

We do the latter, but I think we're missing the former.

Why? The code immediately after the increment updates the hash table.

In D19273#412556, @markj wrote:

In D19273#412540, @cem wrote:

In D19273#412458, @markj wrote:

Solve the race using a per-process generation counter which gets updated
before a tracepoint is removed from the hash table, and after it is
removed from the target process' vmspace.

We do the latter, but I think we're missing the former.

Why? The code immediately after the increment updates the hash table.

Ah, sorry, I misread that sentence / missed the context. For some reason, I thought we would have two increments around either side of the vmspace removal -> hash modification.

Ok, I think this should work. Thanks again.

sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c
1063–1069	This comment is now stale(r).
1100–1105	what's the locking scheme here? seems like this side uses bucket->ftb_mtx mutex but the read side uses fasttrap_tp_lock (rmlock). I guess this is sort of an RCU-like postponed free/cleanup; for now we're just removing it from the list, membership change managed by the mutex. in that case, though, why do we need the rmlock to read the list?
1109	This is implemented as sfence. Doesn't it need a full mfence?
sys/cddl/contrib/opensolaris/uts/intel/dtrace/fasttrap_isa.c
1011–1030	PHOLD is fine as long as it isn't too much additional overhead. Given we can avoid it 99.9% of the time, I think it's ok. Thanks!
1057–1058	Oh, I missed that, sorry. Thanks.
1062–1063	To be a little more specific, we only bump `p_fasttrap_tp_gen` when a tracepoint was removed associated with this process (`pp`) was removed from the hash.

This revision is now accepted and ready to land.Feb 21 2019, 3:52 AM

markj marked an inline comment as done.Feb 21 2019, 4:10 AM

markj added inline comments.

sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c
1063–1069	It's definitely bogus, but I don't see how this diff changes anything.
1100–1105	Yeah, they implemented a poor man's RCU. See fasttrap_mod_barrier(): before freeing a tracepoint structure you need to acquire the fasttrap_tp_lock write lock.
1109	It's supposed to ensure that stores before and after the barrier aren't reordered, which is what sfence does. In fact, sfence is overkill here since stores to writeback memory are already ordered on amd64.

Make a comment more accurate.

This revision now requires review to proceed.Feb 21 2019, 4:11 AM

Harbormaster completed remote builds in B22637: Diff 54173.Feb 21 2019, 4:11 AM

cem accepted this revision.Feb 21 2019, 4:38 AM

cem added inline comments.

sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c
1063–1069	Yeah, not a regression introduced in this change. Maybe channeling bde too much, sorry.
1100–1105	Got it.
1109	Why is store non-reordering sufficient? We don't want loads being reordered before the barrier either. (Practically: on x86, with WB memory, dependent loads are not reordered before stores either, and this is moot.) I guess our fasttrap implementations are x86-only, but e.g. FBT nominally supports non-x86 with weaker memory models, and uses the (?)same primitives. We're getting off into the weeds; orthogonal to this change.

This revision is now accepted and ready to land.Feb 21 2019, 4:38 AM

markj added inline comments.Feb 21 2019, 4:03 PM

sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c
1109	It's a fair question, I don't want to claim that all the synchronization in fasttrap is sound. I spent some time trying to understand the purpose of the barriers and have failed. There are no corresponding membar_consumer()s in fasttrap, and all the barriers are issued while a bucket lock is held. Here, I thought the purpose of the barrier was to order the stores to *pp and ftp_gen, but that doesn't make much sense when you look at how ftp_gen is used. So I'm pretty confused.