Page MenuHomeFreeBSD

fix pmcstat
ClosedPublic

Authored by gallatin on Jul 4 2022, 3:48 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 6, 7:36 AM
Unknown Object (File)
Wed, Nov 6, 5:40 AM
Unknown Object (File)
Oct 8 2024, 9:00 AM
Unknown Object (File)
Oct 4 2024, 10:07 PM
Unknown Object (File)
Oct 4 2024, 3:51 AM
Unknown Object (File)
Sep 23 2024, 2:21 PM
Unknown Object (File)
Sep 1 2024, 1:50 AM
Unknown Object (File)
Aug 29 2024, 12:20 PM
Subscribers

Details

Summary

pmcstat has been broken for analyzing logs since D35342 / b6e28991bf3aadb.

This is because the pmc for the first CPU is not added when reading logs because unlike its clones, its event id is not invalid. That causes us to fail the assertion at lib/libpmcstat/libpmcstat_logging.c:293 when encountering samples from cpu0.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

gallatin added a reviewer: tsoome.

Seems to be good. and on this arm64 machine, it does work:

[1] + Done /usr/obj/usr/home/toomas.soome/src/arm64.aarch64/usr.sbin/pmcstat/pmcstat -n 1000000 -S CPU_CYCLES -O out.pmclog_cpi01 sleep 15
root@snow3:~toomas.soome/src/usr.sbin/pmcstat #
root@snow3:~toomas.soome/src/usr.sbin/pmcstat # /usr/obj/usr/home/toomas.soome/src/arm64.aarch64/usr.sbin/pmcstat/pmcstat -R out.pmclog_cpi01 -z 32 -G out.pmcstat_cpi01
CONVERSION STATISTICS:
#exec/elf 6
#samples/total 28
root@snow3:~toomas.soome/src/usr.sbin/pmcstat #

This revision is now accepted and ready to land.Jul 4 2022, 4:08 PM
This revision was automatically updated to reflect the committed changes.
mhorne added inline comments.
usr.sbin/pmcstat/pmcstat.c
1190

Does pmc_allocate() with an event ID of PMC_ID_INVALID return zero?

usr.sbin/pmcstat/pmcstat.c
1190

I don't see the err() text printed, so it must.

I don't pretend to know how this works, I debugged this by realizing that the samples that were causing pmcstat to crash were coming from CPU 0, and that CPU 0 was apparently not added someplace. I deduced that CPUs 1..N were cloned from CPU 0 and had PMC_ID_INVALID immediately after being cloned. Removing this check got CPU 0 added wherever it needed to be added to keep pmcstat from crashing due to being unable to lookup the samples when analyzing logs. I'm happy if there is a better fix, but what we have now unbreaks a perf analysis workflow that has worked for years.

One of my graduate students found that this change had a seriously bad side effect. Specifically, on a Ryzen processor, instead of being able to collect data from 6 counters simultaneously, he could only configure 3 counters. So, we backed out this change locally.

In D35709#950480, @alc wrote:

One of my graduate students found that this change had a seriously bad side effect. Specifically, on a Ryzen processor, instead of being able to collect data from 6 counters simultaneously, he could only configure 3 counters. So, we backed out this change locally.

Thanks Alan, I have found the same, and I have a fix for it. The problem is that we now allocate the requested event twice on CPU 0, thus reducing the total number of available counters by two.

I will put the fix up for review within the next week, and make sure it is present in 14.0.

In D35709#950480, @alc wrote:

One of my graduate students found that this change had a seriously bad side effect. Specifically, on a Ryzen processor, instead of being able to collect data from 6 counters simultaneously, he could only configure 3 counters. So, we backed out this change locally.

Thanks Alan, I have found the same, and I have a fix for it. The problem is that we now allocate the requested event twice on CPU 0, thus reducing the total number of available counters by two.

I will put the fix up for review within the next week, and make sure it is present in 14.0.

That's good to hear!

Out of curiosity, is anyone working on PEBS/IBS support?

In D35709#950482, @alc wrote:
In D35709#950480, @alc wrote:

One of my graduate students found that this change had a seriously bad side effect. Specifically, on a Ryzen processor, instead of being able to collect data from 6 counters simultaneously, he could only configure 3 counters. So, we backed out this change locally.

Thanks Alan, I have found the same, and I have a fix for it. The problem is that we now allocate the requested event twice on CPU 0, thus reducing the total number of available counters by two.

I will put the fix up for review within the next week, and make sure it is present in 14.0.

I forgot to follow up here. The fix (c362fe939f6f) has landed in stable/14, and I will request to merge it to releng/14.0 on Thursday.

That's good to hear!

Out of curiosity, is anyone working on PEBS/IBS support?

Sort of. @br has developed a new "Hardware Tracing" framework, separate from pmc/hwpmc, which aims to enable these types of profiling features. The work currently focuses on supporting Coresight/ARM SPE, rather than x86, but this paves the way so that adding classes for e.g. Intel PT will be the "easy part".

See D40466, D40477, D40728.

In D35709#950482, @alc wrote:
In D35709#950480, @alc wrote:

One of my graduate students found that this change had a seriously bad side effect. Specifically, on a Ryzen processor, instead of being able to collect data from 6 counters simultaneously, he could only configure 3 counters. So, we backed out this change locally.

Thanks Alan, I have found the same, and I have a fix for it. The problem is that we now allocate the requested event twice on CPU 0, thus reducing the total number of available counters by two.

I will put the fix up for review within the next week, and make sure it is present in 14.0.

I forgot to follow up here. The fix (c362fe939f6f) has landed in stable/14, and I will request to merge it to releng/14.0 on Thursday.

That's good to hear!

Out of curiosity, is anyone working on PEBS/IBS support?

Sort of. @br has developed a new "Hardware Tracing" framework, separate from pmc/hwpmc, which aims to enable these types of profiling features. The work currently focuses on supporting Coresight/ARM SPE, rather than x86, but this paves the way so that adding classes for e.g. Intel PT will be the "easy part".

See D40466, D40477, D40728.

Thanks for the update.

Before I forget, we also ran into a second bug: Specifying ",usr" on a counter no longer worked. Essentially, the ",usr" was just being ignored. We haven't checked if any of the recent changes might have addressed this issue.

In D35709#961602, @alc wrote:

Before I forget, we also ran into a second bug: Specifying ",usr" on a counter no longer worked. Essentially, the ",usr" was just being ignored. We haven't checked if any of the recent changes might have addressed this issue.

I am not surprised, and I doubt anyone has fixed it (though it should be straightforward). This is on an AMD machine?

In D35709#961602, @alc wrote:

Before I forget, we also ran into a second bug: Specifying ",usr" on a counter no longer worked. Essentially, the ",usr" was just being ignored. We haven't checked if any of the recent changes might have addressed this issue.

I am not surprised, and I doubt anyone has fixed it (though it should be straightforward). This is on an AMD machine?

Yes.