Page MenuHomeFreeBSD

arm/pmu: add ACPI attachment
ClosedPublic

Authored by br on Apr 14 2020, 8:00 PM.

Details

Summary

This makes hwpmc/pmcstat sampling work on ACPI (SBSA style) AArch64 systems.

Test Plan

I think I saw ACPI mode working on eMAG?? The interrupt does not fire on my MACCHIATObin though =(
(Armada8k has its PMU interrupts wired to the custom intr controller, but the firmware has a nice hack that's supposed to handle them in the secure monitor and "rethrow" them onto the GIC for the OS (fire the ones listed in MADT), and I've checked that my EDK2 build fires the SMC call enabling that and doesn't fire the disabling one, but it still doesn't work. I'm using Marvell's old-ish TF-A though because upstream TF-A doesn't work for me, but I haven't noticed much difference in that part of the code.)

Tested on: Marvell/SolidRun MACCHIATObin (Armada 8040, upstream EDK2, Marvell TF-A)
Tested on: ARM Neoverse N1 (ARM N1SDP)

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Some of the indentation looks wrong. When splitting a long line the next line should be tabbed to the same level, then indented 4 spaces.

sys/arm/arm/pmu.c
171 ↗(On Diff #70573)

We should split the FDT specific code into a new file.

sys/arm/arm/pmu_acpi.c
56

This should be

/*
 * NOTE: ...
greg_unrelenting.technology edited the summary of this revision. (Show Details)
greg_unrelenting.technology edited the test plan for this revision. (Show Details)

Okay, it works for me now, yay!

So while fiddling with Trusted Firmware, I discovered that the original version of the Armada8k's interrupt-redirect-thing would configure the interrupt as edge-triggered on the GIC *in the handler* (on each interrupt) and this was dropped because Arm reviewers didn't like it and MADT worked to make the OS (Linux) configure it as edge triggered. I temporarily added that config code, and I was getting.. one interrupt when starting pmcstat (better than nothing!) so I debugged hwpmc stuff and realized that the interrupt was being handled on the wrong CPU. Added binding to fix that :) Back to the edge triggering, I realized it wasn't configured in FreeBSD, so I removed the config from firmware and rewrote the interrupt configuration to not use BUS_CONFIG_INTR (which is currently a stub on arm64) and use ACPI_BUS_MAP_INTR directly to pass the trigger mode.

BTW, I thought the binding might make multi-core support possible, so I tried uncommenting the notyet stuff, but nope, the interrupts for the other cores are not firing. (Though hwpmc seems to do everything correctly, the configuration happens on each core, it does thread binding in machine-independent code.) Actually maaaaybe that might be a weird hardware quirk as well, would be cool if someone tried on a machine that doesn't do weird things to throw PMU interrupts between different interrupt controllers in firmware :D

(Also split the fdt, hopefully fixed indentation)

mmel requested changes to this revision.May 19 2020, 3:44 PM
mmel added a subscriber: mmel.
mmel added inline comments.
sys/arm/arm/pmu.c
147 ↗(On Diff #71000)

This is wrong. You may bind interrupt to CPU only if given PMU uses multiple per-core SPIs but not if it uses single PPI. Any attempt to bind PPI should generate panic.

sys/arm/arm/pmu_acpi.c
72

This is nothing but an unacceptable gross hack. In addition, it is a clear sign that there is something very broken in the ACPI crap. The interrupt resource for PMU should be parsed and instantiated in exactly same way that all other interrupts. If something is missing in MADT handling in ACPI driver, it should be fixed within it, not in some other driver.

This revision now requires changes to proceed.May 19 2020, 3:44 PM
sys/arm/arm/pmu.c
147 ↗(On Diff #71000)

The driver currently does not support more than one core.

Is there a good way to check if per-core interrupts are not used? Just check if the irq is not equal between entries?

I can add the condition now, but it would only be tested when multi-core support is added.

sys/arm/arm/pmu_acpi.c
72

In ACPI, the PMU is *not* described as a normal device with resources (that would be in DSDT).
From the perspective of ACPI, there is no such thing as a "PMU device", the PMU interrupt is just a field in the interrupt controller description.

Here's an example of what MADT is:

https://github.com/tianocore/edk2-platforms/blob/66d22b1b440631a8229ad52ef235d96cb5e8819b/Silicon/Marvell/Armada7k8k/AcpiTables/Madt.aslc

Sorry for late response, i'm busy by real life these days and you forced me to dig into problem much deeper than I wanted :).

sys/arm/arm/pmu.c
147 ↗(On Diff #71000)

That's not true. if interrupt is PPI, everything works. Otherwise counter overflow events are not handled, everything else is fine.

I afraid that intrng doesn't have such method yet, but I noted this in my DODO list.
Another problem is that you cannot expect that interrupt resource ID is equal with cpuid passed as last argument to bus_bind_intr. FDT uses following construct https://github.com/torvalds/linux/blob/master/arch/arm/boot/dts/tegra124.dtsi#L1183 for this case (im not sure if ACPI have something similar) but we are talking about common code.

sys/arm/arm/pmu_acpi.c
72

I understand. Please see how are the irregularities are handled in i386/amd64 world and where the code lives -> sys/x86/acpica/madt.c. At my best you should move PMU enumeration into appropriate arm64 directory (/sys/arm64/acpica/) and split it from real device driver.

sys/arm/arm/pmu.c
147 ↗(On Diff #71000)

Yeah so for now (in an update not published yet) I just have a comparison that determines whether the number is different.

you cannot expect that interrupt resource ID is equal with cpuid passed as last argument to bus_bind_intr

In ACPI code, I'm mapping interrupt intr->PerformanceInterrupt to resource ID intr->CpuInterfaceNumber, so I can expect this.

But yeah, looks like FDT code doesn't handle this.

Maybe this part of the function shouldn't be common anymore, and for now ACPI attach would do this, and FDT attach would just not bind, so it would be as it was before?

sys/arm/arm/pmu_acpi.c
72

sys/x86/acpica/madt.c

hm, all "irregularities" there seem to be interrupt controller quirks, not creating an unrelated device from data that exists in MADT.

move PMU enumeration into appropriate arm64 directory (/sys/arm64/acpica/)

Why?

gic_acpi.c lives in sys/arm/arm, and it's an arm64 thing. sys/arm/arm/generic_timer.c also has ACPI inside. So from these two examples it seems like the convention is for devices that share anything between 32 and 64 bit, everything lives in arm/arm.

sys/arm/arm/pmu.c
147 ↗(On Diff #71000)

No, sorry, that's not true. Internal cpuid is not coupled with ACPI id (which is stored in pcpu->pc_acpi_id) nor with GIC position. You always must do some translation irrespective if we are on ACPI or FDT system.
In short, you opened big pandora box. This requirement (an attempt to bind SPI interrupt to exact physical core) is new and intrng (and other parts of kernel) are not yet prepared for this situation.
And because PMU is not a critical part, so we should follow standard way how to solve this problem. We should implement missing pieces of intrng (query if interrupt is PPI) and/or kernel (availability of cpuid translation api.
Feel free to start these changes or wait until I or someone else will find time to do it.

sys/arm/arm/pmu_acpi.c
72

Well, in attempt to unlock this "clash", my biggest problem is usage of ACPI_BUS_MAP_INTR() in this context. This method is part of resources allocation protocol and should be called only within it contract. (As you can see nobody use this method in similar context). Also accessing parent device ivars from child is clear layering violation.
And sorry, I’m again out of time. We may continue this discuss later today if you want more ‘information’.

sys/arm/arm/pmu.c
147 ↗(On Diff #71000)

Oh, right, CpuInterfaceNumber is wrong.

MADT position should be correct though — sys/arm64/arm64/mp_machdep.c starts CPUs and increments cpuid exactly in the order of walking MADT!

pmu_acpi is an arm64 specific attachment, it should be okay for it to rely on arm64 cpu initialization order.
Translation API feels like over-engineering ¯\_(ツ)_/¯

Also I just realized (*facepalm*) no need to compare whether the interrupt is the same, the number ranges are telling enough.
For this, an API makes a lot more sense than for the cpu order thing, but I'd still just do a GIC_FIRST_PPI / GIC_LAST_PPI comparison directly in pmu_acpi at first because it's easy :P (And doesn't make any wrong assumptions. arm64 ACPI systems will *always* have a GIC. Nobody is going to support third party interrupt controllers in ACPI.)

sys/arm/arm/pmu_acpi.c
180

Why is this in the ACPI function? It looks like it would be safe in the parent as the flag will never be set in the FDT code.

Rebased, switched to MPIDR for cpuid finding

I tested this recently on an ARM64 ACPI box, and ran into a panic at boot:

pmu0: <Performance Monitoring Unit> on acpi0
pmu0: MADT: cpu 0 (mpidr 1179648) irq 23 level-triggered
pmu0: MADT: cpu 1 (mpidr 1703936) irq 23 level-triggered
pmu0: MADT: cpu 2 (mpidr 1310720) irq 23 level-triggered
pmu0: MADT: cpu 3 (mpidr 1835008) irq 23 level-triggered
pmu0: MADT: cpu 4 (mpidr 1048576) irq 23 level-triggered
pmu0: MADT: cpu 5 (mpidr 1572864) irq 23 level-triggered
pmu0: MADT: cpu 6 (mpidr 1441792) irq 23 level-triggered
pmu0: MADT: cpu 7 (mpidr 1966080) irq 23 level-triggered
pmu0: MADT: cpu 8 (mpidr 655360) irq 23 level-triggered
pmu0: MADT: cpu 9 (mpidr 2228224) irq 23 level-triggered
pmu0: MADT: cpu 10 (mpidr 786432) irq 23 level-triggered
pmu0: MADT: cpu 11 (mpidr 2359296) irq 23 level-triggered
pmu0: MADT: cpu 12 (mpidr 524288) irq 23 level-triggered
pmu0: MADT: cpu 13 (mpidr 2097152) irq 23 level-triggered
Fatal data abort:
  x0:                0
  x1: ffffa0001d8103a8
  x2:                0
  x3:                a
  x4: ffff000000721fc0
  x5:               10
  x6:                f
  x7:              254
  x8: ffffa0001d80fe00
  x9: ffff000000aad928
 x10:                1
 x11:                0
 x12:                0
 x13:                4
 x14:               80
 x15:                1
 x16:              700
 x17:                3
 x18: ffff000000f277e0
 x19: ffff000000f27870
 x20: ffff0002023fd43c
 x21:                d
 x22: ffffa0001d80fe00
 x23: ffffa0001d80bc00
 x24:               17
 x25: ffff000000823cd6
 x26: ffffa0001d80bb00
 x27: ffffa0001d805af0
 x28:                1
 x29: ffff000000f277f0
  sp: ffff000000f277e0
  lr: ffff000000720860
 elr: ffff000000720888
spsr:         604001c5
 far:                0
 esr:         96000004
panic: vm_fault failed: ffff000000720888
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x184
panic() at panic+0x44
data_abort() at data_abort+0x1d8
handle_el1h_sync() at handle_el1h_sync+0x74
--- exception, esr 0x96000004
madt_handler() at madt_handler+0xe8
acpi_walk_subtables() at acpi_walk_subtables+0x3c
pmu_acpi_attach() at pmu_acpi_attach+0x64
device_attach() at device_attach+0x400
bus_generic_new_pass() at bus_generic_new_pass+0x11c
bus_generic_new_pass() at bus_generic_new_pass+0xac
bus_generic_new_pass() at bus_generic_new_pass+0xac
root_bus_configure() at root_bus_configure+0x40
mi_startup() at mi_startup+0x22c
virtdone() at virtdone+0x6c
KDB: enter: panic
[ thread pid 0 tid 100000 ]
Stopped at      kdb_enter+0x40: undefined       d4200000

Splitting out the FDT code and adding the ACPI bindings should probably be done in separate commits, and there's nothing stopping the former from landing whilst the latter is iterated on. That keeps it easier to review things.

Never mind, somehow ended up on an earlier version of the diff...

sys/arm/arm/pmu.c
147 ↗(On Diff #71000)

GICv3.1 adds support for 64 extended PPIs (and 1024 extended SPIs); just because there will be a GIC doesn't mean it'll always be a GICv3 and nothing new will come along that breaks assumptions made outside the GIC driver itself once the GIC driver starts to use those new features.

I tested this recently on an ARM64 ACPI box, and ran into a panic at boot:

pmu0: <Performance Monitoring Unit> on acpi0
pmu0: MADT: cpu 0 (mpidr 1179648) irq 23 level-triggered
pmu0: MADT: cpu 1 (mpidr 1703936) irq 23 level-triggered
pmu0: MADT: cpu 2 (mpidr 1310720) irq 23 level-triggered
pmu0: MADT: cpu 3 (mpidr 1835008) irq 23 level-triggered
pmu0: MADT: cpu 4 (mpidr 1048576) irq 23 level-triggered
pmu0: MADT: cpu 5 (mpidr 1572864) irq 23 level-triggered
pmu0: MADT: cpu 6 (mpidr 1441792) irq 23 level-triggered
pmu0: MADT: cpu 7 (mpidr 1966080) irq 23 level-triggered
pmu0: MADT: cpu 8 (mpidr 655360) irq 23 level-triggered
pmu0: MADT: cpu 9 (mpidr 2228224) irq 23 level-triggered
pmu0: MADT: cpu 10 (mpidr 786432) irq 23 level-triggered
pmu0: MADT: cpu 11 (mpidr 2359296) irq 23 level-triggered
pmu0: MADT: cpu 12 (mpidr 524288) irq 23 level-triggered
pmu0: MADT: cpu 13 (mpidr 2097152) irq 23 level-triggered
Fatal data abort:
  x0:                0
  x1: ffffa0001d8103a8
  x2:                0
  x3:                a
  x4: ffff000000721fc0
  x5:               10
  x6:                f
  x7:              254
  x8: ffffa0001d80fe00
  x9: ffff000000aad928
 x10:                1
 x11:                0
 x12:                0
 x13:                4
 x14:               80
 x15:                1
 x16:              700
 x17:                3
 x18: ffff000000f277e0
 x19: ffff000000f27870
 x20: ffff0002023fd43c
 x21:                d
 x22: ffffa0001d80fe00
 x23: ffffa0001d80bc00
 x24:               17
 x25: ffff000000823cd6
 x26: ffffa0001d80bb00
 x27: ffffa0001d805af0
 x28:                1
 x29: ffff000000f277f0
  sp: ffff000000f277e0
  lr: ffff000000720860
 elr: ffff000000720888
spsr:         604001c5
 far:                0
 esr:         96000004
panic: vm_fault failed: ffff000000720888
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x184
panic() at panic+0x44
data_abort() at data_abort+0x1d8
handle_el1h_sync() at handle_el1h_sync+0x74
--- exception, esr 0x96000004
madt_handler() at madt_handler+0xe8
acpi_walk_subtables() at acpi_walk_subtables+0x3c
pmu_acpi_attach() at pmu_acpi_attach+0x64
device_attach() at device_attach+0x400
bus_generic_new_pass() at bus_generic_new_pass+0x11c
bus_generic_new_pass() at bus_generic_new_pass+0xac
bus_generic_new_pass() at bus_generic_new_pass+0xac
root_bus_configure() at root_bus_configure+0x40
mi_startup() at mi_startup+0x22c
virtdone() at virtdone+0x6c
KDB: enter: panic
[ thread pid 0 tid 100000 ]
Stopped at      kdb_enter+0x40: undefined       d4200000

For the FDT-based PMU bindings, the bindings map from the PMU to a list interrupts, and there are two ways it works. Either you have a single interrupts entry with a PPI, and that is interpreted to mean that PPI is used across all CPUs, or you have a list of SPIs. Both Linux and FreeBSD will balk if it sees more than one interrupt where at least one is a PPI (although it looks like FreeBSD will silently ignore the rest of them if the first is a PPI, whereas Linux does check the number of interrupts before its early bail-out). This means you either parse 1 PPI, or N SPIs, where N is the number of interrupt entries (which I assume has to be MAXCPU? or at least the number that support PMU interrupts), but crucially FreeBSD will cap that at MAX_RLEN, which is defined to 8. Presumably for the high core-count systems vendors are sensible enough that they don't go and allocate MAXCPU SPIs and instead just use a single common PPI[*], and thus FDT-based booting is happy here despite the cap (though you wouldn't crash, just only be able to get overflow interrupts from a subset of CPUs).

For the ACPI-based PMU bindings, the IRQs are stored in the MADT per CPU, there is no flexibility in the way you express it, you just fill in the field as part of the CPU's GIC information (the GIC CPU Interface, or GICC, structure). This means you always end up with MAXCPU entries; even in the case that a PPI is being used, you end up with MAXCPU copies describing the same PPI resource.

Since you have more than 8 CPUs in your system, you're overflowing the fixed-size MAX_RLEN array in the softc, which is presumably why you get a fatal page fault. I imagine bumping that up to MAXCPU would fix the issue (and support crazy systems that allocate more than 8 SPIs just for the PMU). Though I'm not sure if we shouldn't also be unifying the representation across FDT and ACPI (i.e. avoiding the 1 vs N difference when a shared PPI is used), which would likely mean flipping the table to be CPU->IRQ rather than "id"->CPU+IRQ. That would make the FDT-based binding implementation less efficient time-wise (but not space, you still need the table to be big enough for MAXCPU, and flipping the table would then *decrease* its size as the CPU would be implied by the index and thus not need storing any more) as you'd end up splatting out a bunch of copies of the same PPI (either by allocating the same resource N times to match what ACPI ends up doing, or by making N copies of the same resource, but then you need to be careful with the error path in pmu_attach to avoid double-teardown). If doing that, it probably makes sense to do that in a separate patch as it's not strictly needed for ACPI to work. For what it's worth, CPU->IRQ is what Linux ends up doing for both, which makes sense to me.

[*] Rockchip RK3368 and a couple of UNISOC FDTs go up to 8 cores and 8 PMU SPIs, but I those are the largest I see our (well, our imported from Linux) vendored device trees

br updated this revision to Diff 97492.
br added a reviewer: greg_unrelenting.technology.

o Since INTRNG abuses 'virtual' property of a resource to store interrupt config, use it to set desired interrupt configuration here.
o Bump MAX_RLEN to MAXCPU

br retitled this revision from arm/pmu: add ACPI attachment, more FDT names to arm/pmu: add ACPI attachment.Oct 26 2021, 5:41 PM
br edited the test plan for this revision. (Show Details)

I've tested this on ARM Neoverse N1 (ARM N1SDP) and it works fine (both sampling and counting modes).

The current version looks good to me. If you give me a day or two I can test on the FF's eMAG, and we should give @mmel a chance to make any objections known.

sys/arm/arm/pmu_acpi.c
86

Nit: most drivers prefer the bus_set_resource() wrapper.

94–104

This is tricky, and at first I thought it did not work. But it seems like an okay way to deal with this special case :)

sys/arm/arm/pmu_acpi.c
5

Should probably be the other way round?

74

What if we can't find the CPU? Seems to me that should either be an error that fails the attachment or a warning and early return in order to at least attach for the other CPUs (I prefer the former though otherwise you could end up reporting success but be unable to attach for any CPU)

92

This needs to be a proper error, currently it (a) continues on to rman_get_virtual of the NULL pointer (b) doesn't return an error from the attach function.

99

Are there cases where this is not true, and if so is silently continuing the right thing to do? I'm assuming this should always be true and thus should be a KASSERT, or maybe even conditional panic. That or fail the attachment with an error like any other "could not map resource" kind of error.

109

This isn't quite right for GICv3.1, which adds more "extended" PPIs (1056-1119) and SPIs (4096-5119), so it'd be nice to centralise this logic in the GIC driver itself rather than also baked in here such that adding GICv3.1 support is localised to the GIC driver; maybe a macro or inline function in gic_common.h where this macro comes from?

172–176

It's extremely underused, but this can be written more succinctly as:

DEFINE_CLASS_0(pmu, pmu_acpi_driver, pmu_acpi_methods, sizeof(struct pmu_softc));

Address jrtc27's and mhorne's notes

sys/arm/arm/pmu_acpi.c
121

Could we use intr_is_per_cpu here? This driver shouldn't need to know about GIC internal details.

Was it intentional you reverted MAX_RLEN back to 8?

Restore MAX_RLEN change

Was it intentional you reverted MAX_RLEN back to 8?

missed, thanks@

Fix bug: Initialize cpuid to -1.
Also for PPI we need just one resource, so stop iteration over ACPI tables if PMU PPI interrupt found.

I don't think there's any requirement that each CPU use the same PPI, unlike for the FDT bindings where that's an assumption made by both Linux and FreeBSD?

(Though we're not at all set up to be able to handle that if that's true...)

Collect all PPIs since they indeed could be different on different CPUs

mhorne added inline comments.
sys/arm/arm/pmu_acpi.c
73–79

It would be useful to create a general helper function somewhere for this mpidr -> cpuid conversion, but this is a TODO item at best.

Typo to fix on commit but otherwise looks fine to me

sys/arm/arm/pmu_acpi.c
83
This revision was not accepted when it landed; it landed in state Needs Review.Tue, Nov 2, 7:51 PM
This revision was automatically updated to reflect the committed changes.