Page MenuHomeFreeBSD

Implement per-thread counters for PMC sampling

Authored by mmacy on Nov 20 2015, 2:01 AM.



As a followup to r290930 (D4122), this implements per-thread counters for
PMC sampling. The thread descriptors are stored in a list attached to the
process descriptor. These thread descriptors can store any per-thread
information necessary for current or future features. For the moment, they
just store the counters for sampling.

The thread descriptors are created when the process descriptor is created.
Additionally, thread descriptors are created or freed when threads
are started or stopped. Because the thread exit function is called in a
critical section, we can't directly free the thread descriptors. Hence,
they are freed to a cache, which is also used as a source of allocations
when needed for new threads.

Approved by: gnn (mentor) [Assuming...]
MFC after: 1 month
Sponsored by: Juniper Networks

Test Plan

I created two test programs: one which starts and stops tons of threads,
with each thread doing just a little bit of work. I verified that the
thread descriptors were created and deleted correctly.

A second program runs just a few threads for a long time. It appears that
sampling is being correctly gathered for these threads.

Diff Detail

rS FreeBSD src repository
Lint OK
No Unit Test Coverage
Build Status
Buildable 1227
Build 1231: arc lint + arc unit

Event Timeline

jtl retitled this revision from to Implement per-thread counters for PMC sampling.
jtl updated this object.
jtl edited the test plan for this revision. (Show Details)
jtl added reviewers: gnn, sjg, bz, jhb.

Did you run the test for your previous change as well? It would be good to test that this does not introduce a regression in what you fixed earlier (it shouldn't but would be good to check)


This leaks the P_HWPMC flag if the below tests fail (e.g. the ENOMEM case)


Tiny style(9) nit would be to not initialize variables in declarations. You could initialize it to NULL near the first line ('p = td->td_proc') instead.


Explicit 'void' in function args for C.


Rather than using a timer, you could use a "fast" taskqueue. You can then synchronously call taskqueue_enqueue_fast() from pmc_thread_descriptor_pool_free() if the number was above the limit. The task handler would recheck the value as below.


I would init tdlistsz to 32 above the restart: label (style nit).


Spaces around '='


Nothing keeps these thread pointers from becoming invalid once the proc lock is dropped (if you raced with pthread_exit() for example). Instead you might need to do something closer to this:

         tdcnt = 0;

         FOREACH_THREAD_IN_PROC(p, curtd)
              if (curtd doesn't have a descriptor)

         if (tdcnt == 0)

         tdlist = alloc tdcnt thread descriptors (just malloc them, not init)
        i = 0;
        FOREACH_THREAD_IN_PROC(p, curtd) {
            if (curtd needs a descriptor) {
                use tdlist[i] for curtd
                if (i == tdcnt)

        while (i < tdcnt) {
            free tdlist[i];

        free tdlist array
        goto restart;

Who can race with the caller of this thread to read the pp_tds list? If we are about to free pp then we must have the only reference so we shouldn't need the spin lock here at all. If we do then there is a race with free'ing 'pp' that needs to be fixed instead.

Is this still the most updated patch of this work?

jtl does this need further work?

In D4227#229888, @sjg wrote:

jtl does this need further work?


This is the most recent version of the patch.

AFAIK, what is necessary is to:

  • Rebase this on head. (IIRC, it was primarily developed on a private fork of stable/10, although this patch should be based on head before stable/11 branched.) I believe the biggest changes there may be moving some of the code around to follow functions that moved during some restructuring that occurred between kern_thread.c and kern_thr.c.
  • Address @jhb's comments. I believe they are valid and merit fixing.

After that, we can commit D7350, which depends on this. That patch uses this per-thread infrastructure to capture per-thread user callchains. That was the actual need I was trying to address.

FWIW, this was much more important to $work[-1] than it is to $work[0], and it is unclear when I will get either personal or work time to finish this up and test it. If your organization has a need, feel free to finish it off and commit it. If not, I'll leave it on my 'to do' list to try to finish up before stable/12 branches... :-(

mmacy added a reviewer: jtl.
mmacy added a subscriber: mmacy.

Committed in r333690