Paths

Table of Contentst

Differential D35709

fix pmcstat
ClosedPublic
Actions

Authored by gallatin on Jul 4 2022, 3:48 PM.

Details

Reviewers

tsoome

Commits

rG0aa150775179: pmcstat: fix log analysis

Summary

pmcstat has been broken for analyzing logs since D35342 / b6e28991bf3aadb.

This is because the pmc for the first CPU is not added when reading logs because unlike its clones, its event id is not invalid. That causes us to fail the assertion at lib/libpmcstat/libpmcstat_logging.c:293 when encountering samples from cpu0.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

gallatin created this revision.Jul 4 2022, 3:48 PM

Herald added a subscriber: imp. · View Herald TranscriptJul 4 2022, 3:48 PM

gallatin requested review of this revision.Jul 4 2022, 3:48 PM

gallatin added a reviewer: tsoome.

Seems to be good. and on this arm64 machine, it does work:

[1] + Done /usr/obj/usr/home/toomas.soome/src/arm64.aarch64/usr.sbin/pmcstat/pmcstat -n 1000000 -S CPU_CYCLES -O out.pmclog_cpi01 sleep 15
root@snow3:~toomas.soome/src/usr.sbin/pmcstat #
root@snow3:~toomas.soome/src/usr.sbin/pmcstat # /usr/obj/usr/home/toomas.soome/src/arm64.aarch64/usr.sbin/pmcstat/pmcstat -R out.pmclog_cpi01 -z 32 -G out.pmcstat_cpi01
CONVERSION STATISTICS:
#exec/elf 6
#samples/total 28
root@snow3:~toomas.soome/src/usr.sbin/pmcstat #

This revision is now accepted and ready to land.Jul 4 2022, 4:08 PM

Closed by commit rG0aa150775179: pmcstat: fix log analysis (authored by gallatin). · Explain WhyJul 4 2022, 4:43 PM

This revision was automatically updated to reflect the committed changes.

gallatin added a commit: rG0aa150775179: pmcstat: fix log analysis.

mhorne added a subscriber: mhorne.Jul 4 2022, 4:52 PM

mhorne added inline comments.

usr.sbin/pmcstat/pmcstat.c
1190	Does `pmc_allocate()` with an event ID of `PMC_ID_INVALID` return zero?

gallatin added inline comments.Jul 4 2022, 6:03 PM

usr.sbin/pmcstat/pmcstat.c
1190	I don't see the err() text printed, so it must. I don't pretend to know how this works, I debugged this by realizing that the samples that were causing pmcstat to crash were coming from CPU 0, and that CPU 0 was apparently not added someplace. I deduced that CPUs 1..N were cloned from CPU 0 and had PMC_ID_INVALID immediately after being cloned. Removing this check got CPU 0 added wherever it needed to be added to keep pmcstat from crashing due to being unable to lookup the samples when analyzing logs. I'm happy if there is a better fix, but what we have now unbreaks a perf analysis workflow that has worked for years.

One of my graduate students found that this change had a seriously bad side effect. Specifically, on a Ryzen processor, instead of being able to collect data from 6 counters simultaneously, he could only configure 3 counters. So, we backed out this change locally.

In D35709#950480, @alc wrote:

One of my graduate students found that this change had a seriously bad side effect. Specifically, on a Ryzen processor, instead of being able to collect data from 6 counters simultaneously, he could only configure 3 counters. So, we backed out this change locally.

Thanks Alan, I have found the same, and I have a fix for it. The problem is that we now allocate the requested event twice on CPU 0, thus reducing the total number of available counters by two.

I will put the fix up for review within the next week, and make sure it is present in 14.0.

In D35709#950481, @mhorne wrote:

In D35709#950480, @alc wrote:

One of my graduate students found that this change had a seriously bad side effect. Specifically, on a Ryzen processor, instead of being able to collect data from 6 counters simultaneously, he could only configure 3 counters. So, we backed out this change locally.

Thanks Alan, I have found the same, and I have a fix for it. The problem is that we now allocate the requested event twice on CPU 0, thus reducing the total number of available counters by two.

I will put the fix up for review within the next week, and make sure it is present in 14.0.

That's good to hear!

Out of curiosity, is anyone working on PEBS/IBS support?

In D35709#950482, @alc wrote:

In D35709#950481, @mhorne wrote:

In D35709#950480, @alc wrote:

One of my graduate students found that this change had a seriously bad side effect. Specifically, on a Ryzen processor, instead of being able to collect data from 6 counters simultaneously, he could only configure 3 counters. So, we backed out this change locally.

Thanks Alan, I have found the same, and I have a fix for it. The problem is that we now allocate the requested event twice on CPU 0, thus reducing the total number of available counters by two.

I will put the fix up for review within the next week, and make sure it is present in 14.0.

I forgot to follow up here. The fix (c362fe939f6f) has landed in stable/14, and I will request to merge it to releng/14.0 on Thursday.

That's good to hear!

Out of curiosity, is anyone working on PEBS/IBS support?

Sort of. @br has developed a new "Hardware Tracing" framework, separate from pmc/hwpmc, which aims to enable these types of profiling features. The work currently focuses on supporting Coresight/ARM SPE, rather than x86, but this paves the way so that adding classes for e.g. Intel PT will be the "easy part".

See D40466, D40477, D40728.

In D35709#961583, @mhorne wrote:

In D35709#950482, @alc wrote:

In D35709#950481, @mhorne wrote:

In D35709#950480, @alc wrote:

One of my graduate students found that this change had a seriously bad side effect. Specifically, on a Ryzen processor, instead of being able to collect data from 6 counters simultaneously, he could only configure 3 counters. So, we backed out this change locally.

Thanks Alan, I have found the same, and I have a fix for it. The problem is that we now allocate the requested event twice on CPU 0, thus reducing the total number of available counters by two.

I will put the fix up for review within the next week, and make sure it is present in 14.0.

I forgot to follow up here. The fix (c362fe939f6f) has landed in stable/14, and I will request to merge it to releng/14.0 on Thursday.

That's good to hear!

Out of curiosity, is anyone working on PEBS/IBS support?

Sort of. @br has developed a new "Hardware Tracing" framework, separate from pmc/hwpmc, which aims to enable these types of profiling features. The work currently focuses on supporting Coresight/ARM SPE, rather than x86, but this paves the way so that adding classes for e.g. Intel PT will be the "easy part".

See D40466, D40477, D40728.

Thanks for the update.

Before I forget, we also ran into a second bug: Specifying ",usr" on a counter no longer worked. Essentially, the ",usr" was just being ignored. We haven't checked if any of the recent changes might have addressed this issue.

In D35709#961602, @alc wrote:

Before I forget, we also ran into a second bug: Specifying ",usr" on a counter no longer worked. Essentially, the ",usr" was just being ignored. We haven't checked if any of the recent changes might have addressed this issue.

I am not surprised, and I doubt anyone has fixed it (though it should be straightforward). This is on an AMD machine?

In D35709#961603, @mhorne wrote:

In D35709#961602, @alc wrote:

Before I forget, we also ran into a second bug: Specifying ",usr" on a counter no longer worked. Essentially, the ",usr" was just being ignored. We haven't checked if any of the recent changes might have addressed this issue.

I am not surprised, and I doubt anyone has fixed it (though it should be straightforward). This is on an AMD machine?

Yes.

Revision Contents
Changeset List

Path

Size

usr.sbin/

pmcstat/

pmcstat.c

3 lines

Diff 107751

View Options

usr.sbin/pmcstat/pmcstat.c

fix pmcstat ClosedPublicActions