Page MenuHomeFreeBSD

Fix a hash table lookup race in fasttrap_pid_probe().
ClosedPublic

Authored by markj on Feb 20 2019, 9:26 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 17, 9:21 PM
Unknown Object (File)
Sun, Jan 12, 3:57 PM
Unknown Object (File)
Sat, Jan 11, 11:14 AM
Unknown Object (File)
Fri, Jan 10, 7:28 AM
Unknown Object (File)
Sun, Dec 29, 11:50 PM
Unknown Object (File)
Dec 21 2024, 3:54 AM
Unknown Object (File)
Dec 20 2024, 6:19 PM
Unknown Object (File)
Dec 20 2024, 6:09 PM
Subscribers

Details

Summary

The kernel component of userland dtrace, fasttrap, hooks the breakpoint
handler and uses a hash table, keyed by PID and PC, to look up the probe
structure for a given tracepoint. The hash table lookup is subject to
the following race:

Thread 1:
Hits a userspace breakpoint, traps into the kernel.
Enables interrupts.

Thread 2:
Uninstalls the breakpoint from thread 1's address space.
Removes the tracepoint structure from the hash table.

Thread 1:
Hash table lookup fails.

In this case, thread 1 concludes that something else caused the
breakpoint trap. Since it is not being traced anymore, i.e., there's no
debugger to forward the trap to, the kernel sends SIGTRAP to thread 1,
killing it.

illumos solves this problem by rereading the trapping instruction upon a
hash table lookup failure. If that instruction doesn't look like it
should have caused a breakpoint, the breakpoint is "consumed" and the
thread is set up to retry the instruction. I don't really like that
solution: it's machine dependent and not forwards-compatible. That is,
if Intel ever adds some new mechanism for raising #BP and fasttrap
doesn't know about it, it could cause the target thread to enter an
infinite loop.

Another solution is to have dtrace(1) stop the target process, remove
tracepoints from its address space, and resume the process before
removing all probes. I started implementing this, but it is
complicated: a new ioctl is needed to remove tracepoints, there are lots
of error cases, etc..

Solve the race using a per-process generation counter which gets updated
before a tracepoint is removed from the hash table, and after it is
removed from the target process' vmspace. When thread 1's lookup fails,
it compares its copy of the counter with the process' value and retries
the #BP instruction if they differ. If the breakpoint was caused by a
vanishing fasttrap tracepoint, this ensures that we will retry the
instruction rather than sending SIGTRAP. If the breakpoint was not
caused by DTrace, fasttrap will consume the breakpoint (and retry) at
most once so long as dtrace(1) is not actively removing probes. If it
is removing probes, then it has just ptrace(PT_DETACH)ed the target
process, so it is quite unlikely that a debugger has attached and
installed breakpoints of its own in that small window.

Test Plan

Test suite.

It's possible to reproduce this by, for example, calling strlen()
in a loop, probing every instruction in strlen(), and killing dtrace(1).
With this patch applied I wasn't able to trigger any problems.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj added a reviewer: DTrace.

I wasn't careful about preserving #ifdefs here: fasttrap has been virtually untouched upstream for the past 5 years, so once this commit is in I'm going to get rid of illumos ifdefs there in an attempt to make the code a bit easier to read.

cem requested changes to this revision.Feb 21 2019, 2:01 AM

I wasn't careful about preserving #ifdefs here: fasttrap has been virtually untouched upstream for the past 5 years, so once this commit is in I'm going to get rid of illumos ifdefs there in an attempt to make the code a bit easier to read.

Sounds good to me.

Solve the race using a per-process generation counter which gets updated
before a tracepoint is removed from the hash table, and after it is
removed from the target process' vmspace.

We do the latter, but I think we're missing the former.

sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c
201 ↗(On Diff #54153)

The global appears to be unused?

sys/cddl/contrib/opensolaris/uts/intel/dtrace/fasttrap_isa.c
1011–1030 ↗(On Diff #54153)

I'm not sure this assumption expressed in the (probably Illumos) comment holds for FreeBSD. For example, I think rfork(2) can create shared a vmspace child process that runs at the same time as its parent, e.g., RFMEM | RFPROC and rfork_thread(3). The traversal of p_pptr with proctree_lock held is fine, but the assumption that pp will remain valid after dropping that lock maybe isn't. Perhaps PHOLD() is acceptable?

In that scenario, I think pp can exit after we drop proctree_lock but before we check p_fasttrap_tp_gen. We're not going to crash, since proc is type stable, and I think it's (?)somewhat unlikely we'll get a false positive. But the possibility exists, I guess.

(We could avoid the PHOLD if we could safely check that the parent process was suspended or only running this thread? Or perhaps we could just go ahead and use the cached p_pid as a heuristic.)

1057–1058 ↗(On Diff #54153)

Note that neither of these fields is within startzero/endzero nor startcopy/endcopy; they're never necessarily initialized to a good value, AFAICT.

This revision now requires changes to proceed.Feb 21 2019, 2:01 AM
markj marked an inline comment as done.
  • Remove global.
  • Make sure that the parent doesn't go away during the lookup.
sys/cddl/contrib/opensolaris/uts/intel/dtrace/fasttrap_isa.c
1011–1030 ↗(On Diff #54153)

This code is all pretty broken in such a scenario. For instance, the parent may already have exited and reparented the child by this point, so the tracepoint lookup would certainly fail. Or, the probes may have been created in the child rather than the parent, so the parent will hit them but never find them in the table.

I think using PHOLD() is fine. We only need it if the process shares its vmspace with its parent, which is hopefully a rare scenario. We could do something like abort (return -1) if the process and its parent share a vmspace and P_PPWAIT is not set in the child, but that seems more fragile.

1057–1058 ↗(On Diff #54153)

They are allocated separately and initialized to 0. See kdtrace_thread_ctor().

In D19273#412540, @cem wrote:

Solve the race using a per-process generation counter which gets updated
before a tracepoint is removed from the hash table, and after it is
removed from the target process' vmspace.

We do the latter, but I think we're missing the former.

Why? The code immediately after the increment updates the hash table.

In D19273#412540, @cem wrote:

Solve the race using a per-process generation counter which gets updated
before a tracepoint is removed from the hash table, and after it is
removed from the target process' vmspace.

We do the latter, but I think we're missing the former.

Why? The code immediately after the increment updates the hash table.

Ah, sorry, I misread that sentence / missed the context. For some reason, I thought we would have two increments around either side of the vmspace removal -> hash modification.

Ok, I think this should work. Thanks again.

sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c
1062–1068 ↗(On Diff #54172)

This comment is now stale(r).

1099–1104 ↗(On Diff #54172)

what's the locking scheme here? seems like this side uses bucket->ftb_mtx mutex but the read side uses fasttrap_tp_lock (rmlock). I guess this is sort of an RCU-like postponed free/cleanup; for now we're just removing it from the list, membership change managed by the mutex. in that case, though, why do we need the rmlock to read the list?

1108 ↗(On Diff #54172)

This is implemented as sfence. Doesn't it need a full mfence?

sys/cddl/contrib/opensolaris/uts/intel/dtrace/fasttrap_isa.c
1077–1078 ↗(On Diff #54172)

To be a little more specific, we only bump p_fasttrap_tp_gen when a tracepoint was removed associated with this process (pp) was removed from the hash.

1011–1030 ↗(On Diff #54153)

PHOLD is fine as long as it isn't too much additional overhead. Given we can avoid it 99.9% of the time, I think it's ok. Thanks!

1057–1058 ↗(On Diff #54153)

Oh, I missed that, sorry. Thanks.

This revision is now accepted and ready to land.Feb 21 2019, 3:52 AM
markj added inline comments.
sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c
1062–1068 ↗(On Diff #54172)

It's definitely bogus, but I don't see how this diff changes anything.

1099–1104 ↗(On Diff #54172)

Yeah, they implemented a poor man's RCU. See fasttrap_mod_barrier(): before freeing a tracepoint structure you need to acquire the fasttrap_tp_lock write lock.

1108 ↗(On Diff #54172)

It's supposed to ensure that stores before and after the barrier aren't reordered, which is what sfence does. In fact, sfence is overkill here since stores to writeback memory are already ordered on amd64.

Make a comment more accurate.

This revision now requires review to proceed.Feb 21 2019, 4:11 AM
cem added inline comments.
sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c
1062–1068 ↗(On Diff #54172)

Yeah, not a regression introduced in this change. Maybe channeling bde too much, sorry.

1099–1104 ↗(On Diff #54172)

Got it.

1108 ↗(On Diff #54172)

Why is store non-reordering sufficient? We don't want loads being reordered before the barrier either. (Practically: on x86, with WB memory, dependent loads are not reordered before stores either, and this is moot.)

I guess our fasttrap implementations are x86-only, but e.g. FBT nominally supports non-x86 with weaker memory models, and uses the (?)same primitives. We're getting off into the weeds; orthogonal to this change.

This revision is now accepted and ready to land.Feb 21 2019, 4:38 AM
sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c
1108 ↗(On Diff #54172)

It's a fair question, I don't want to claim that all the synchronization in fasttrap is sound.

I spent some time trying to understand the purpose of the barriers and have failed. There are no corresponding membar_consumer()s in fasttrap, and all the barriers are issued while a bucket lock is held. Here, I thought the purpose of the barrier was to order the stores to *pp and ftp_gen, but that doesn't make much sense when you look at how ftp_gen is used. So I'm pretty confused.

This revision was automatically updated to reflect the committed changes.