Improve accuracy of PMC sampling frequency
AbandonedPublic
Actions

Authored by jtl on Nov 10 2015, 7:20 PM.

Details

Reviewers

sjg
jhb
gnn
bz

Summary

The code tracks a counter which is the number of events until the next
sample. On context switch in, it loads the saved counter. On context
switch out, it tries to calculate a new saved counter.

Problems:

The saved counter was shared by all threads in a process. However, this

means that all threads would be initially loaded with the same saved
counter. However, that could result in sampling more often than once every
X number of events.

The calculation to determine a new saved counter was backwards. It

added when it should have subtracted, and subtracted when it should have
added. Assume a single-threaded process with a reload count of 1000 events.
Assuming the counter on context switch in was 100 and the counter on context
switch out was 50 (meaning the thread has "consumed" 50 more events), the
code would calculate a new saved counter of 150 (instead of the proper 50).

Fix:

As soon as the saved counter is used to initialize a monitor for a

thread on context switch in, set the saved counter to the reload count.
That way, subsequent threads to use the saved counter will get the full
reload count, assuring we sample at least once every X number of events
(across all threads).

Change the calculation of the saved counter. Due to the change to the

saved counter in #1, we simply need to add (modulo the reload count) the
remaining counter time we retrieve from the CPU when a thread is context
switched out.

Sponsored by: Juniper Networks
Approved by: gnn (mentor) [Assuming he does, of course. ;-)]
MFC after: 1 month

Test Plan

Prior to applying this patch, I saw unexpected variations (sometimes large)
in the number of samples gathered by HWPMC.

After applying this patch, the number of samples gathered by HWPMC became
more stable.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 1133
Build 1137: arc lint + arc unit

Event Timeline

jtl updated this revision to Diff 10103.Nov 10 2015, 7:20 PM

jtl retitled this revision from to Improve accuracy of PMC sampling frequency.

jtl updated this object.

jtl edited the test plan for this revision. (Show Details)

jtl added reviewers: sjg, gnn, jhb, bz.

Herald added a subscriber: imp. · View Herald TranscriptNov 10 2015, 7:20 PM

gnn accepted this revision.Nov 12 2015, 2:27 AM

gnn edited edge metadata.

This revision is now accepted and ready to land.Nov 12 2015, 2:27 AM

jhb added inline comments.Nov 12 2015, 4:10 AM

sys/dev/hwpmc/hwpmc_mod.c
1279	Is this comment stale now since it still says "per-process"?
1285	Nit: style(9) prefers full sentences for comments, and a newline before a new comment. I would perhaps expand the comment further along the lines of: /* * Use the saved value from the most recent thread switch out to start * this counter. Reset the saved count so in case another thread from this * process switches in before any threads switch out. */ I think this still highlights a bit of a flaw in the approach though. Suppose you had a process with two threads. Suppose the reload count is N. Suppose that for whatever reason, both threads A and B always trigger N - 1 events during each "slice" between context switches. You can then have this timeline: Thread A starts with an initial count of N. Thread B starts with an initial count of N. A switches out after N - 1 events, so no sample is logged during this slice, and the saved count is set to 1. B switches out after N - 1 events, so no sample is logged either. foo: A resumes with a count of 1. This resets the saved count to N. B resumes with a count of N. A switches out after N - 1 events, (1 sample logged) and the saved count is set to 2. B switches out after N - 1 events, no samples logged, saved count set to 1. goto foo and repeat The end result in this pathological case is that B would never log any samples. I'm not sure how you can fix this apart from having per-thread saved counts. Part of the problem is that once you get in the loop, A keeps resuming with B's saved count (which is a condition that can occur in the existing algorithm as well).
1444	This is definitely clearer than the old approach, I almost made a similar change the last time I looked at this as well as the current code is non-obvious. If pp_pmcval were per-thread you'd think you could just do 'pp_pmcval = newvalue' directly.

jtl added inline comments.Nov 12 2015, 2:16 PM

sys/dev/hwpmc/hwpmc_mod.c
1279	It isn't really stale; we're still tracking this on a per-process basis. (Although, see my later comments on threads.)
1444	If pp_pmcval were per-thread ... I made this change to improve the algorithm, but I definitely think the per-thread approach is ultimately better (albeit more complex in other ways). If you'd like, I can work up such a change instead. I have a chunk of the code from another change I was considering.