Page MenuHomeFreeBSD

Make hwpmc work for userspace binaries again
AcceptedPublic

Authored by gallatin on Thu, Nov 18, 7:06 PM.

Details

Summary

hwpmc has been utterly broken for userspace binaries, and has been labeling all samples from userspace binaries as dubious frames.
The issues are that:

  • the check for ph.p_offset & (-ph.p_align) == 0 seemed entirely bogus, and was preventing any binary's exec. section from being considred. Replace this with a check for the current vaddr == 0 (meaning we're only taking the 1st exec. section in a binary)
  • due to how map-ins are recorded, we were double-counting the offsets for shared libraries. Jessica came up with a brilliant workaround, which is to simply not set the vaddr for relocatable objects.

Thanks so much to Jessica for all the help in making hwpmc work correctly. This is mostly her work.

Test Plan
  • collect a sample of your system. similar to: pmcstat -n 1000000 -S resource_stalls.any -S cpu_clk_unhalted.thread_p -O out.pmclog_cpi01 sleep 10
  • process the log file and produce a graph: pmcstat -R /d/fg/out.pmclog_cpi01 -G foo.graph
  • Ensure there are a low number of "dubious frames", and that dynamically linked userspace process have stacks recorded in the profile. Note it may be necessary to increase the number of pmc buffers (kern.hwpmc.nbuffers_pcpu=128 in loader.conf), as map-in records may be lost (hence preventing resolution of some executable sections, since they are missing from the logs)

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

I generally trust @jrtc27 on this.

@brett_gutste.in Does the fix to ignore offsets for !ET_REL help with PIE binaries (e.g. does it reduce the scope of some of your patch over in CheriBSD to fix hwpmc on PIE?)

This revision is now accepted and ready to land.Tue, Nov 30, 7:05 PM
  • the check for ph.p_offset & (-ph.p_align) == 0 seemed entirely bogus, and was preventing any binary's exec. section from being considred. Replace this with a check for the current vaddr == 0 (meaning we're only taking the 1st exec. section in a binary)

The intent of this check was to find the first executable segment, under the assumption that it would be in the first page of the file (the check is "page == 0", since -align just gets you a mask for the upper bits). This perhaps used to be the case with BFD (looking at /bin/sh on a random Linux box it does have such a layout), but is not the case with LLD. So the commit message should probably be clarified to explain that.

(Also typo on "considred")

  • due to how map-ins are recorded, we were double-counting the offsets for shared libraries. Jessica came up with a brilliant workaround, which is to simply not set the vaddr for relocatable objects.

Again, could be clearer. The double-counting was specifically for the offset of the executable segment within the library (which, again, used to not matter because that would be 0 with BFD due to being the first page, but matters with LLD), not the offset (base address) of the entire library in memory; currently it's a bit ambiguous which offset you're talking about (or even if it's both).

As for the change here, it's a fix not a workaround, and "for relocatable objects" isn't the important part. Previously it was eh.e_type == ET_EXEC, which excluded ET_DYN (shared libraries) and ET_REL (.o files, which are actually what .ko files are on amd64 and mips, hence why they're handled by pmcstat at all), but should have been eh.e_type == ET_EXEC || eh.e_type == ET_DYN, which eh.e_type != ET_REL is a shorter way of writing due to there only being three possibilities (and we've already checked that it's one of the three above).

Though, now I revisit this, I realise this isn't actually quite right when the two diffs are composed like this; if a shared library has more than one executable segment and the first one is within the first page then pi_vaddr will be set to the second one's, since I used pi_vaddr == 0 as a proxy for "not yet found", which works for executables (as it was when I originally wrote that line) but not shared libraries, for which I wrote the patch second. We should probably instead do:

diff --git a/lib/libpmcstat/libpmcstat_image.c b/lib/libpmcstat/libpmcstat_image.c
index 9ee7097e95ec..798590044759 100644
--- a/lib/libpmcstat/libpmcstat_image.c
+++ b/lib/libpmcstat/libpmcstat_image.c
@@ -295,6 +295,7 @@ pmcstat_image_get_elf_params(struct pmcstat_image *image,
 	size_t i, nph, nsh;
 	const char *path, *elfbase;
 	char *p, *endp;
+	bool first_exec_segment;
 	uintfptr_t minva, maxva;
 	Elf *e;
 	Elf_Scn *scn;
@@ -384,7 +385,7 @@ pmcstat_image_get_elf_params(struct pmcstat_image *image,
 	 * loaded.  Additionally, for dynamically linked executables,
 	 * save the pathname to the runtime linker.
 	 */
-	if (eh.e_type == ET_EXEC) {
+	if (eh.e_type != ET_REL) {
 		if (elf_getphnum(e, &nph) == 0) {
 			warnx(
 "WARNING: Could not determine the number of program headers in \"%s\": %s.",
@@ -392,6 +393,7 @@ pmcstat_image_get_elf_params(struct pmcstat_image *image,
 			    elf_errmsg(-1));
 			goto done;
 		}
+		first_exec_segment = true;
 		for (i = 0; i < eh.e_phnum; i++) {
 			if (gelf_getphdr(e, i, &ph) != &ph) {
 				warnx(
@@ -416,8 +418,10 @@ pmcstat_image_get_elf_params(struct pmcstat_image *image,
 				break;
 			case PT_LOAD:
 				if ((ph.p_flags & PF_X) != 0 &&
-				    (ph.p_offset & (-ph.p_align)) == 0)
+				    first_exec_segment) {
 					image->pi_vaddr = ph.p_vaddr & (-ph.p_align);
+					first_exec_segment = false;
+				}
 				break;
 			}
 		}

(you could instead mess with initialising pi_vaddr to -1 since that's not a valid value for pi_vaddr, but it's needlessly hard to follow)

In D33055#750301, @jhb wrote:

I generally trust @jrtc27 on this.

@brett_gutste.in Does the fix to ignore offsets for !ET_REL help with PIE binaries (e.g. does it reduce the scope of some of your patch over in CheriBSD to fix hwpmc on PIE?)

I believe it should fix everything for PIEs except the lack of knowledge about what the base load address was, since won't get a MAP_IN event for it like shared libraries?

In D33055#750301, @jhb wrote:

I generally trust @jrtc27 on this.

@brett_gutste.in Does the fix to ignore offsets for !ET_REL help with PIE binaries (e.g. does it reduce the scope of some of your patch over in CheriBSD to fix hwpmc on PIE?)

I believe it should fix everything for PIEs except the lack of knowledge about what the base load address was, since won't get a MAP_IN event for it like shared libraries?

Yes this was the primary change besides plumbing information about the loaded entry address through.

While making these changes we may also want to set image->pi_entry to eh.e_entry in pmcstat_image_get_elf_params(), as this currently isn't done. Adding this and an image->pi_ispositionindependent field were the only other non-plumbing changes I made in my patch.