This makes hwpmc/pmcstat sampling work on ACPI (SBSA style) AArch64 systems.
Details
- Reviewers
andrew emaste mmel gonzo mhorne val_packett.cool jrtc27 - Group Reviewers
arm64 Contributor Reviews (src) - Commits
- rG4bb6991531b5: arm/pmu: add ACPI attachment.
I think I saw ACPI mode working on eMAG?? The interrupt does not fire on my MACCHIATObin though =(
(Armada8k has its PMU interrupts wired to the custom intr controller, but the firmware has a nice hack that's supposed to handle them in the secure monitor and "rethrow" them onto the GIC for the OS (fire the ones listed in MADT), and I've checked that my EDK2 build fires the SMC call enabling that and doesn't fire the disabling one, but it still doesn't work. I'm using Marvell's old-ish TF-A though because upstream TF-A doesn't work for me, but I haven't noticed much difference in that part of the code.)
Tested on: Marvell/SolidRun MACCHIATObin (Armada 8040, upstream EDK2, Marvell TF-A)
Tested on: ARM Neoverse N1 (ARM N1SDP)
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
Okay, it works for me now, yay!
So while fiddling with Trusted Firmware, I discovered that the original version of the Armada8k's interrupt-redirect-thing would configure the interrupt as edge-triggered on the GIC *in the handler* (on each interrupt) and this was dropped because Arm reviewers didn't like it and MADT worked to make the OS (Linux) configure it as edge triggered. I temporarily added that config code, and I was getting.. one interrupt when starting pmcstat (better than nothing!) so I debugged hwpmc stuff and realized that the interrupt was being handled on the wrong CPU. Added binding to fix that :) Back to the edge triggering, I realized it wasn't configured in FreeBSD, so I removed the config from firmware and rewrote the interrupt configuration to not use BUS_CONFIG_INTR (which is currently a stub on arm64) and use ACPI_BUS_MAP_INTR directly to pass the trigger mode.
BTW, I thought the binding might make multi-core support possible, so I tried uncommenting the notyet stuff, but nope, the interrupts for the other cores are not firing. (Though hwpmc seems to do everything correctly, the configuration happens on each core, it does thread binding in machine-independent code.) Actually maaaaybe that might be a weird hardware quirk as well, would be cool if someone tried on a machine that doesn't do weird things to throw PMU interrupts between different interrupt controllers in firmware :D
(Also split the fdt, hopefully fixed indentation)
sys/arm/arm/pmu.c | ||
---|---|---|
147 ↗ | (On Diff #71000) | This is wrong. You may bind interrupt to CPU only if given PMU uses multiple per-core SPIs but not if it uses single PPI. Any attempt to bind PPI should generate panic. |
sys/arm/arm/pmu_acpi.c | ||
73 | This is nothing but an unacceptable gross hack. In addition, it is a clear sign that there is something very broken in the ACPI crap. The interrupt resource for PMU should be parsed and instantiated in exactly same way that all other interrupts. If something is missing in MADT handling in ACPI driver, it should be fixed within it, not in some other driver. |
sys/arm/arm/pmu.c | ||
---|---|---|
147 ↗ | (On Diff #71000) | The driver currently does not support more than one core. Is there a good way to check if per-core interrupts are not used? Just check if the irq is not equal between entries? I can add the condition now, but it would only be tested when multi-core support is added. |
sys/arm/arm/pmu_acpi.c | ||
73 | In ACPI, the PMU is *not* described as a normal device with resources (that would be in DSDT). Here's an example of what MADT is: |
Sorry for late response, i'm busy by real life these days and you forced me to dig into problem much deeper than I wanted :).
sys/arm/arm/pmu.c | ||
---|---|---|
147 ↗ | (On Diff #71000) | That's not true. if interrupt is PPI, everything works. Otherwise counter overflow events are not handled, everything else is fine. I afraid that intrng doesn't have such method yet, but I noted this in my DODO list. |
sys/arm/arm/pmu_acpi.c | ||
73 | I understand. Please see how are the irregularities are handled in i386/amd64 world and where the code lives -> sys/x86/acpica/madt.c. At my best you should move PMU enumeration into appropriate arm64 directory (/sys/arm64/acpica/) and split it from real device driver. |
sys/arm/arm/pmu.c | ||
---|---|---|
147 ↗ | (On Diff #71000) | Yeah so for now (in an update not published yet) I just have a comparison that determines whether the number is different.
In ACPI code, I'm mapping interrupt intr->PerformanceInterrupt to resource ID intr->CpuInterfaceNumber, so I can expect this. But yeah, looks like FDT code doesn't handle this. Maybe this part of the function shouldn't be common anymore, and for now ACPI attach would do this, and FDT attach would just not bind, so it would be as it was before? |
sys/arm/arm/pmu_acpi.c | ||
73 |
hm, all "irregularities" there seem to be interrupt controller quirks, not creating an unrelated device from data that exists in MADT.
Why? gic_acpi.c lives in sys/arm/arm, and it's an arm64 thing. sys/arm/arm/generic_timer.c also has ACPI inside. So from these two examples it seems like the convention is for devices that share anything between 32 and 64 bit, everything lives in arm/arm. |
sys/arm/arm/pmu.c | ||
---|---|---|
147 ↗ | (On Diff #71000) | No, sorry, that's not true. Internal cpuid is not coupled with ACPI id (which is stored in pcpu->pc_acpi_id) nor with GIC position. You always must do some translation irrespective if we are on ACPI or FDT system. |
sys/arm/arm/pmu_acpi.c | ||
73 | Well, in attempt to unlock this "clash", my biggest problem is usage of ACPI_BUS_MAP_INTR() in this context. This method is part of resources allocation protocol and should be called only within it contract. (As you can see nobody use this method in similar context). Also accessing parent device ivars from child is clear layering violation. |
sys/arm/arm/pmu.c | ||
---|---|---|
147 ↗ | (On Diff #71000) | Oh, right, CpuInterfaceNumber is wrong. MADT position should be correct though — sys/arm64/arm64/mp_machdep.c starts CPUs and increments cpuid exactly in the order of walking MADT! pmu_acpi is an arm64 specific attachment, it should be okay for it to rely on arm64 cpu initialization order. Also I just realized (*facepalm*) no need to compare whether the interrupt is the same, the number ranges are telling enough. |
sys/arm/arm/pmu_acpi.c | ||
---|---|---|
181 | Why is this in the ACPI function? It looks like it would be safe in the parent as the flag will never be set in the FDT code. |
I tested this recently on an ARM64 ACPI box, and ran into a panic at boot:
pmu0: <Performance Monitoring Unit> on acpi0 pmu0: MADT: cpu 0 (mpidr 1179648) irq 23 level-triggered pmu0: MADT: cpu 1 (mpidr 1703936) irq 23 level-triggered pmu0: MADT: cpu 2 (mpidr 1310720) irq 23 level-triggered pmu0: MADT: cpu 3 (mpidr 1835008) irq 23 level-triggered pmu0: MADT: cpu 4 (mpidr 1048576) irq 23 level-triggered pmu0: MADT: cpu 5 (mpidr 1572864) irq 23 level-triggered pmu0: MADT: cpu 6 (mpidr 1441792) irq 23 level-triggered pmu0: MADT: cpu 7 (mpidr 1966080) irq 23 level-triggered pmu0: MADT: cpu 8 (mpidr 655360) irq 23 level-triggered pmu0: MADT: cpu 9 (mpidr 2228224) irq 23 level-triggered pmu0: MADT: cpu 10 (mpidr 786432) irq 23 level-triggered pmu0: MADT: cpu 11 (mpidr 2359296) irq 23 level-triggered pmu0: MADT: cpu 12 (mpidr 524288) irq 23 level-triggered pmu0: MADT: cpu 13 (mpidr 2097152) irq 23 level-triggered Fatal data abort: x0: 0 x1: ffffa0001d8103a8 x2: 0 x3: a x4: ffff000000721fc0 x5: 10 x6: f x7: 254 x8: ffffa0001d80fe00 x9: ffff000000aad928 x10: 1 x11: 0 x12: 0 x13: 4 x14: 80 x15: 1 x16: 700 x17: 3 x18: ffff000000f277e0 x19: ffff000000f27870 x20: ffff0002023fd43c x21: d x22: ffffa0001d80fe00 x23: ffffa0001d80bc00 x24: 17 x25: ffff000000823cd6 x26: ffffa0001d80bb00 x27: ffffa0001d805af0 x28: 1 x29: ffff000000f277f0 sp: ffff000000f277e0 lr: ffff000000720860 elr: ffff000000720888 spsr: 604001c5 far: 0 esr: 96000004 panic: vm_fault failed: ffff000000720888 cpuid = 0 time = 1 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x184 panic() at panic+0x44 data_abort() at data_abort+0x1d8 handle_el1h_sync() at handle_el1h_sync+0x74 --- exception, esr 0x96000004 madt_handler() at madt_handler+0xe8 acpi_walk_subtables() at acpi_walk_subtables+0x3c pmu_acpi_attach() at pmu_acpi_attach+0x64 device_attach() at device_attach+0x400 bus_generic_new_pass() at bus_generic_new_pass+0x11c bus_generic_new_pass() at bus_generic_new_pass+0xac bus_generic_new_pass() at bus_generic_new_pass+0xac root_bus_configure() at root_bus_configure+0x40 mi_startup() at mi_startup+0x22c virtdone() at virtdone+0x6c KDB: enter: panic [ thread pid 0 tid 100000 ] Stopped at kdb_enter+0x40: undefined d4200000
Splitting out the FDT code and adding the ACPI bindings should probably be done in separate commits, and there's nothing stopping the former from landing whilst the latter is iterated on. That keeps it easier to review things.
Never mind, somehow ended up on an earlier version of the diff...
sys/arm/arm/pmu.c | ||
---|---|---|
147 ↗ | (On Diff #71000) | GICv3.1 adds support for 64 extended PPIs (and 1024 extended SPIs); just because there will be a GIC doesn't mean it'll always be a GICv3 and nothing new will come along that breaks assumptions made outside the GIC driver itself once the GIC driver starts to use those new features. |
For the FDT-based PMU bindings, the bindings map from the PMU to a list interrupts, and there are two ways it works. Either you have a single interrupts entry with a PPI, and that is interpreted to mean that PPI is used across all CPUs, or you have a list of SPIs. Both Linux and FreeBSD will balk if it sees more than one interrupt where at least one is a PPI (although it looks like FreeBSD will silently ignore the rest of them if the first is a PPI, whereas Linux does check the number of interrupts before its early bail-out). This means you either parse 1 PPI, or N SPIs, where N is the number of interrupt entries (which I assume has to be MAXCPU? or at least the number that support PMU interrupts), but crucially FreeBSD will cap that at MAX_RLEN, which is defined to 8. Presumably for the high core-count systems vendors are sensible enough that they don't go and allocate MAXCPU SPIs and instead just use a single common PPI[*], and thus FDT-based booting is happy here despite the cap (though you wouldn't crash, just only be able to get overflow interrupts from a subset of CPUs).
For the ACPI-based PMU bindings, the IRQs are stored in the MADT per CPU, there is no flexibility in the way you express it, you just fill in the field as part of the CPU's GIC information (the GIC CPU Interface, or GICC, structure). This means you always end up with MAXCPU entries; even in the case that a PPI is being used, you end up with MAXCPU copies describing the same PPI resource.
Since you have more than 8 CPUs in your system, you're overflowing the fixed-size MAX_RLEN array in the softc, which is presumably why you get a fatal page fault. I imagine bumping that up to MAXCPU would fix the issue (and support crazy systems that allocate more than 8 SPIs just for the PMU). Though I'm not sure if we shouldn't also be unifying the representation across FDT and ACPI (i.e. avoiding the 1 vs N difference when a shared PPI is used), which would likely mean flipping the table to be CPU->IRQ rather than "id"->CPU+IRQ. That would make the FDT-based binding implementation less efficient time-wise (but not space, you still need the table to be big enough for MAXCPU, and flipping the table would then *decrease* its size as the CPU would be implied by the index and thus not need storing any more) as you'd end up splatting out a bunch of copies of the same PPI (either by allocating the same resource N times to match what ACPI ends up doing, or by making N copies of the same resource, but then you need to be careful with the error path in pmu_attach to avoid double-teardown). If doing that, it probably makes sense to do that in a separate patch as it's not strictly needed for ACPI to work. For what it's worth, CPU->IRQ is what Linux ends up doing for both, which makes sense to me.
[*] Rockchip RK3368 and a couple of UNISOC FDTs go up to 8 cores and 8 PMU SPIs, but I those are the largest I see our (well, our imported from Linux) vendored device trees
o Since INTRNG abuses 'virtual' property of a resource to store interrupt config, use it to set desired interrupt configuration here.
o Bump MAX_RLEN to MAXCPU
I've tested this on ARM Neoverse N1 (ARM N1SDP) and it works fine (both sampling and counting modes).
The current version looks good to me. If you give me a day or two I can test on the FF's eMAG, and we should give @mmel a chance to make any objections known.
sys/arm/arm/pmu_acpi.c | ||
---|---|---|
87 | Nit: most drivers prefer the bus_set_resource() wrapper. | |
95–105 | This is tricky, and at first I thought it did not work. But it seems like an okay way to deal with this special case :) |
sys/arm/arm/pmu_acpi.c | ||
---|---|---|
6 | Should probably be the other way round? | |
75 | What if we can't find the CPU? Seems to me that should either be an error that fails the attachment or a warning and early return in order to at least attach for the other CPUs (I prefer the former though otherwise you could end up reporting success but be unable to attach for any CPU) | |
93 | This needs to be a proper error, currently it (a) continues on to rman_get_virtual of the NULL pointer (b) doesn't return an error from the attach function. | |
100 | Are there cases where this is not true, and if so is silently continuing the right thing to do? I'm assuming this should always be true and thus should be a KASSERT, or maybe even conditional panic. That or fail the attachment with an error like any other "could not map resource" kind of error. | |
110 | This isn't quite right for GICv3.1, which adds more "extended" PPIs (1056-1119) and SPIs (4096-5119), so it'd be nice to centralise this logic in the GIC driver itself rather than also baked in here such that adding GICv3.1 support is localised to the GIC driver; maybe a macro or inline function in gic_common.h where this macro comes from? | |
173–177 | It's extremely underused, but this can be written more succinctly as: DEFINE_CLASS_0(pmu, pmu_acpi_driver, pmu_acpi_methods, sizeof(struct pmu_softc)); |
sys/arm/arm/pmu_acpi.c | ||
---|---|---|
122 | Could we use intr_is_per_cpu here? This driver shouldn't need to know about GIC internal details. |
Fix bug: Initialize cpuid to -1.
Also for PPI we need just one resource, so stop iteration over ACPI tables if PMU PPI interrupt found.
I don't think there's any requirement that each CPU use the same PPI, unlike for the FDT bindings where that's an assumption made by both Linux and FreeBSD?
sys/arm/arm/pmu_acpi.c | ||
---|---|---|
74–80 | It would be useful to create a general helper function somewhere for this mpidr -> cpuid conversion, but this is a TODO item at best. |