hwpmc: add PMU event/group structures and rotation runtime
Needs RevisionPublic
Actions

Authored by raghavendra.kt_amd.com on Thu, Jun 18, 7:06 AM.

Details

Reviewers

imp
mhorne
markj
jhb
afscoelho_gmail.com
ali_mashtizadeh.com
gnn

Group Reviewers

pmc

Summary

Add a thin virtual layer (sys/dev/hwpmc/hwpmc_pmu.{h,c}) on top of the
existing hwpmc row programming model, plus the multiplex rotation
back-end that the v0 series left as stubs.

Types (exposed as typedefs per review feedback):

pmu_event_t per-pmc state used to defer HW row binding

			until commit/attach: state, leader flag,
			pre-computed scheduling constraint,
			time-enabled / time-running counters used
			by the multiplex layer.

pmu_group_t leader + sibling list with all-or-none

			commit semantics.  Carries multiplex
			bookkeeping (pg_running, pg_defer_ok,
			pg_assigned, pg_used_rows_mask).

pmc_sched_constraint_t allowed-row bitmask + popcount-as-weight

			+ FIXED/EXCLUSIVE/SHARED flags.  Lower
			weight means more constrained, which the
			assigner uses for most-constrained-first
			greedy placement.

Cross-architecture safety: the grouping/multiplex scheduler is only
implemented where a backend can describe per-event scheduling
constraints (x86/AMD today), so hwpmc_pmu.c and hwpmc_assign.c are
built on amd64/i386 only. hwpmc_pmu.h therefore gates the real entry
points behind HWPMC_PMU_GROUPS (defined for amd64 / i386) and
provides static-inline no-op stubs (pmu_group_on_allocate &c. return
EOPNOTSUPP, the csw/release hooks do nothing) for every other
architecture. The architecture-independent hwpmc_mod.c thus links
unchanged on arm64/arm/powerpc; behaviour there is identical to a
pre-grouping hwpmc. Extending support to another backend is a matter
of adding its constraint provider and defining HWPMC_PMU_GROUPS for it.

Group lifecycle: pmu_group_create / pmu_group_add / pmu_group_commit /
pmu_group_release / pmu_group_lookup. Per-PMC hooks:
pmu_group_on_allocate allocates the pmu_event; pmu_group_on_release
removes the pe from the group TAILQ before freeing it

Rotation runtime:

pmu_pp_schedule_in atomic per-pp placement of a whole group:

				pmu_assign_group, attach every sibling,
				flip pm_state from STOPPED back to
				RUNNING (rotation-evicted PMCs were
				otherwise stuck STOPPED forever).

pmu_pp_schedule_out mirror image: detach, free rows, flip

				pm_state to STOPPED, mark pg_assigned
				false.

pmu_pp_kick_rotate wake the per-pp rotation kthread.
pmu_pp_rotate_thread one kthread per pmc_process; sleeps on

				pp_pmu_rot_thread, ticks every
				rotation_period_us microseconds.

pmu_pp_rotate_one cursor-based round-robin: evict every

				currently scheduled group, then walk
				pp_pmu_groups starting from
				pp_pmu_rot_cursor and schedule_in until
				the first ENOSPC, which pins the next
				tick's cursor.  This avoids the
				FIFO+greedy starvation pattern (small
				groups repeatedly winning the leftover
				slots).

pmu_pp_release_all drain the rotation kthread and sever every

				group still hooked off pp before pp is
				freed.  Wired into pmc_process_exit by
				patch 0004.

The PMC_F_GROUP_MUX commit fallback (over-subscription accepted instead
of returning ENOSPC) is wired up so userland can already opt in via
pmcstat -b.

Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com>

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 73959
Build 70842: arc lint + arc unit

Event Timeline

raghavendra.kt_amd.com created this revision.Thu, Jun 18, 7:06 AM

Herald added a reviewer: gnn. · View Herald TranscriptThu, Jun 18, 7:06 AM

raghavendra.kt_amd.com requested review of this revision.Thu, Jun 18, 7:06 AM

Harbormaster completed remote builds in B73959: Diff 179976.Thu, Jun 18, 7:06 AM

raghavendra.kt_amd.com added a parent revision: D57630: sys: hwpmc PMU group/multiplex ABI additions.Thu, Jun 18, 7:06 AM

raghavendra.kt_amd.com added a child revision: D57632: hwpmc: add PMU assigner and AMD Zen constraint provider.

mhorne added a reviewer: pmc.Tue, Jun 30, 3:18 PM

ali_mashtizadeh.com requested changes to this revision.Tue, Jun 30, 3:47 PM

ali_mashtizadeh.com added inline comments.

sys/dev/hwpmc/hwpmc_pmu.c
99	In your pmc_mod.c code you lookup the userspace identifiers and ensure that these aren't NULL, so it's safe to replace them with an assert.
151	same
402	FreeBSD style wants the variables defined at the top. Alternatively reuse error and use __maybe_unused.
890	Sort local variables largest to smallest
905	Again define all variables at the top of the function but initialize it before use.
sys/dev/hwpmc/hwpmc_pmu.h
126	Unless you have a plan to not require this, I think you should just say it requires the get_sched_contraint call.
142	Why are we hard coding the can_assign_pmc and get_sched_constraint functions? Could we not expose it through the pmc_classdep structure? That way you remove the #defines and just call the function that will return EOPNOTSUPP if that function pointer in that structure is NULL. Only additional problem I see is you need to lookup the class of the counter.

This revision now requires changes to proceed.Tue, Jun 30, 3:47 PM

raghavendra.kt_amd.com added inline comments.Mon, Jul 6, 7:46 AM

sys/dev/hwpmc/hwpmc_pmu.c
99	In your pmc_mod.c code you lookup the userspace identifiers and ensure that these aren't NULL, so it's safe to replace them with an assert. Agree. will remove NULL check. Unless we do some fuzzer test with pmu_group_add function separately, this scenario may not occur.
402	FreeBSD style wants the variables defined at the top. Alternatively reuse error and use __maybe_unused. Okay. Thinking to reuse error.
890	Sort local variables largest to smallest Sorry for missing that. Thank you for pointing
905	Again define all variables at the top of the function but initialize it before use. will do
sys/dev/hwpmc/hwpmc_pmu.h
126	Unless you have a plan to not require this, I think you should just say it requires the get_sched_contraint call. Thanks for the review.. Agree.
142	Why are we hard coding the can_assign_pmc and get_sched_constraint functions? Could we not expose it through the pmc_classdep structure? That way you remove the #defines and just call the function that will return EOPNOTSUPP if that function pointer in that structure is NULL. Only additional problem I see is you need to lookup the class of the counter. Good point. Let me also think about what we should do for ARM/PPC later.