Page MenuHomeFreeBSD

Fix two issues that caused sampling to effectively stop.
ClosedPublic

Authored by jhb on May 15 2015, 6:28 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 17 2024, 1:06 AM
Unknown Object (File)
Feb 22 2024, 11:48 PM
Unknown Object (File)
Jan 28 2024, 1:43 AM
Unknown Object (File)
Jan 8 2024, 3:17 AM
Unknown Object (File)
Dec 20 2023, 2:27 AM
Unknown Object (File)
Nov 28 2023, 9:31 PM
Unknown Object (File)
Nov 27 2023, 8:57 AM
Unknown Object (File)
Nov 26 2023, 1:49 AM
Subscribers

Details

Summary

Fix two bugs that could result in PMC sampling effectively stopping.
In both cases, the the effect of the bug was that a very small positive
number was written to the counter. This means that a large number of
events needed to occur before the next sampling interrupt would trigger.
Even with very frequently occurring events like clock cycles wrapping all
the way around could take a long time. Both bugs occurred when updating
the saved reload count for an outgoing thread on a context switch.

First, the counter-independent code compares the current reload count
against the count set when the thread switched in and generates a delta
to apply to the saved count. If this delta causes the reload counter
to go negative, it would add a full reload interval to wrap it around to
a positive value. The fix is to add the full reload interval if the
resulting counter is zero.

Second, occasionally the raw counter value read during a context switch
has actually wrapped, but an interrupt has not yet triggered. In this
case the existing logic would return a very large reload count (e.g.
2^48 - 2 if the counter had overflowed by a count of 2). This was seen
both for fixed-function and programmable counters on an E5-2643.
Workaround this case by returning a reload count of zero.

PR: 198149

Test Plan
  • Run pmcstat -P in top-mode against a spinning process using 15 spinning threads on a 16 core machine. After a few seconds the sampling stops.
  • I added debug printfs (not shown in the diff) to verify that both cases happened with both IAF (CPU_CLK_UNHALTED_REF) and IAP (UOPS_RETIRED.ALL) counters on a dual-socket system with E5-2643 CPUs.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jhb retitled this revision from to Fix two issues that caused sampling to effectively stop..
jhb updated this object.
jhb edited the test plan for this revision. (Show Details)
jhb added reviewers: emaste, davide, adrian, stas.
emaste edited edge metadata.
This revision is now accepted and ready to land.May 15 2015, 8:11 PM
This revision was automatically updated to reflect the committed changes.