Page MenuHomeFreeBSD

amdsmu: Get and expose sysctls for metrics about last sleep
ClosedPublic

Authored by obiwac on Jan 29 2025, 4:14 PM.
Tags
None
Referenced Files
F125033323: D48714.id159194.diff
Sat, Aug 2, 4:44 PM
Unknown Object (File)
Sat, Aug 2, 5:32 AM
Unknown Object (File)
Thu, Jul 31, 9:38 AM
Unknown Object (File)
Mon, Jul 28, 12:30 PM
Unknown Object (File)
Mon, Jul 28, 2:05 AM
Unknown Object (File)
Mon, Jul 28, 1:58 AM
Unknown Object (File)
Mon, Jul 28, 1:28 AM
Unknown Object (File)
Sun, Jul 27, 9:28 AM
Subscribers

Details

Summary

Get "log" address with the SMU_MSG_LOG_GETDRAM_ADDR_HI/LO commands. From this, we dump & read metrics about residency in various sleep states (SW DRIPS, S0i2, and S0i3), as well as active durations for the IP blocks on the CPU during the last sleep. On Linux, these fields are called "time condition not met", but I couldn't find any information about what these conditions are. All these metrics are exposed under the dev.amdsmu.0.metrics node.

This is useful for debugging sleep as the residency registers in the _LPI object are not populated (at least on my Phoenix machine).

A dev.amdsmu.0.ip_blocks.XXX node is also created for each IP block, with children for the current active state and time spent active during the last sleep.

I will add an amdsmu(4) manpage on this stack describing what I know about the SMU and documenting these sysctls.

Test Plan

I have tested this on the Framework 13 AMD Ryzen 7040 series (Phoenix), with SMU firmware version 76.70.0:

% sysctl dev.amdsmu.0
dev.amdsmu.0.metrics.total_time_in_sw_drips: 0
dev.amdsmu.0.metrics.time_last_in_sw_drips: 0
dev.amdsmu.0.metrics.total_time_in_s0i3: 0
dev.amdsmu.0.metrics.time_last_in_s0i3: 0
dev.amdsmu.0.metrics.total_time_resuming: 0
dev.amdsmu.0.metrics.time_last_resuming: 0
dev.amdsmu.0.metrics.total_time_entering_s0i3: 0
dev.amdsmu.0.metrics.time_last_entering_s0i3: 0
dev.amdsmu.0.metrics.time_last_in_s0i2: 0
dev.amdsmu.0.metrics.s0i3_last_entry_status: 0
dev.amdsmu.0.metrics.hint_count: 0
dev.amdsmu.0.metrics.table_version: 3
dev.amdsmu.0.ip_blocks.UMSCH.last_time: 0
dev.amdsmu.0.ip_blocks.UMSCH.active: 1
dev.amdsmu.0.ip_blocks.IPU.last_time: 0
dev.amdsmu.0.ip_blocks.IPU.active: 1
dev.amdsmu.0.ip_blocks.JPEG.last_time: 0
dev.amdsmu.0.ip_blocks.JPEG.active: 1
dev.amdsmu.0.ip_blocks.MPM.last_time: 0
dev.amdsmu.0.ip_blocks.MPM.active: 1
dev.amdsmu.0.ip_blocks.USB4_1.last_time: 0
dev.amdsmu.0.ip_blocks.USB4_1.active: 1
dev.amdsmu.0.ip_blocks.USB4_0.last_time: 6442461721
dev.amdsmu.0.ip_blocks.USB4_0.active: 1
dev.amdsmu.0.ip_blocks.USB3_4.last_time: 0
dev.amdsmu.0.ip_blocks.USB3_4.active: 1
dev.amdsmu.0.ip_blocks.USB3_3.last_time: 0
dev.amdsmu.0.ip_blocks.USB3_3.active: 1
dev.amdsmu.0.ip_blocks.USB3_2.last_time: 0
dev.amdsmu.0.ip_blocks.USB3_2.active: 0
dev.amdsmu.0.ip_blocks.LAPIC.last_time: 0
dev.amdsmu.0.ip_blocks.LAPIC.active: 0
dev.amdsmu.0.ip_blocks.USB3_1.last_time: 0
dev.amdsmu.0.ip_blocks.USB3_1.active: 1
dev.amdsmu.0.ip_blocks.USB3_0.last_time: 0
dev.amdsmu.0.ip_blocks.USB3_0.active: 1
dev.amdsmu.0.ip_blocks.DF.last_time: 0
dev.amdsmu.0.ip_blocks.DF.active: 1
dev.amdsmu.0.ip_blocks.NBIO.last_time: 0
dev.amdsmu.0.ip_blocks.NBIO.active: 0
dev.amdsmu.0.ip_blocks.ISP.last_time: 0
dev.amdsmu.0.ip_blocks.ISP.active: 1
dev.amdsmu.0.ip_blocks.VCN.last_time: 0
dev.amdsmu.0.ip_blocks.VCN.active: 1
dev.amdsmu.0.ip_blocks.ACP.last_time: 0
dev.amdsmu.0.ip_blocks.ACP.active: 1
dev.amdsmu.0.ip_blocks.VDD.last_time: 0
dev.amdsmu.0.ip_blocks.VDD.active: 1
dev.amdsmu.0.ip_blocks.GFX.last_time: 0
dev.amdsmu.0.ip_blocks.GFX.active: 0
dev.amdsmu.0.ip_blocks.CPU.last_time: 0
dev.amdsmu.0.ip_blocks.CPU.active: 0
dev.amdsmu.0.ip_blocks.DISPLAY.last_time: 0
dev.amdsmu.0.ip_blocks.DISPLAY.active: 1
dev.amdsmu.0.version_revision: 0
dev.amdsmu.0.version_minor: 70
dev.amdsmu.0.version_major: 76
dev.amdsmu.0.program: 0
dev.amdsmu.0.%parent: hostb0
dev.amdsmu.0.%pnpinfo: 
dev.amdsmu.0.%location: 
dev.amdsmu.0.%driver: amdsmu
dev.amdsmu.0.%desc:

Note that I'm not getting any of the total active times for the IP blocks, as they return garbage on my system. Also the USB4_0 IP block active time during last sleep fluctuates between garbage and the total time spent in SW DRIPS for some reason. I'm getting the same results on Linux so I'm going to chalk this up to firmware weirdness.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 62090
Build 58974: arc lint + arc unit

Event Timeline

obiwac created this revision.

Explicitly clear added_* variables

sys/dev/amdsmu/amdsmu.c
420–429

Like in the later diff -- this pattern seems funky to me. We should just create the nodes in attach(), once, and skip this added_metrics bool.

511–515

Do we need to unmap smu_space in this failure path?

sys/dev/amdsmu/amdsmu.c
507–514

Suggest moving these sc->res = NULL changes up to the previous diff in the stack (which introduces the code being modified).

obiwac marked 2 inline comments as done.

style(9) alignment and update comment

obiwac marked 2 inline comments as not done.Thu, Jul 24, 11:56 PM

Thanks for the review!

sys/dev/amdsmu/amdsmu.c
420–429

I think the main idea was not to create these nodes if somehow SMU communication failed, because I don't know what I'd set them to otherwise.

I guess I can just zero them as that's what they'd be set to before attempting S0i3 anyway.

507–514

I don't actually remember why I was doing this and don't think its necessary.

sys/dev/amdsmu/amdsmu.c
420–429

It's fine to avoid adding the nodes if SMU communication fails, I would just organize the logic differently.

int amdsmu_dump_metrics(...) {
  ...
  if (cmd fails) {
    return error;
  }
  ...
  return 0;
}

int attach() {
  int error = dump_metrics();
  if (error == 0) {
    add sysctl nodes;
  }
}

Something like that.

obiwac marked 2 inline comments as done.

Restructure sysctl adding code.

sys/dev/amdsmu/amdsmu.c
420–429

Ah right gotcha

kib added inline comments.
sys/dev/amdsmu/amdsmu.c
342
348
372

So is this a main memory, or some device memory?
If it is the main memory, and SMU writes are coherent, then using bus_space_map() might be not great, because it ends up with non-cacheable mapping.

sys/dev/amdsmu/amdsmu.h
60 ↗(On Diff #159071)

I probably miss something reading the code, but how this array gets written to?

61 ↗(On Diff #159071)

Why hiding the member? I understand why you might want not to activate some code, but if structure needs some members sometimes, it is better to have them added unconditionally.

66 ↗(On Diff #159071)

Put them one per line.

obiwac added inline comments.
sys/dev/amdsmu/amdsmu.c
372

The Linux driver uses devm_ioremap to map this memory which I believe would imply SMU writes are not cache coherent. It is indeed in main memory though so I guess I could use pmap_mapdev_attr(..., VM_MEMATTR_UNCACHEABLE). Would you rather I use that or is it fine as is?

sys/dev/amdsmu/amdsmu.h
60 ↗(On Diff #159071)

It never gets explicitly written to. In amdsmu_dump_metrics, the metrics space gets read into sc->metrics which is of type struct amdsmu_metrics.

61 ↗(On Diff #159071)

This follows the Linux struct but from what I can tell ip_block_total_active_time just contains garbage, at least on Phoenix. I'm including the member of the struct so it's there if we need it later, but I don't want to actually have it part of the struct until I'm sure we're not reading extra memory we shouldn't.

If you have a preferred solution to this I'm all ears though.

obiwac marked 2 inline comments as done.

Trivial fixes & style(9).

sys/dev/amdsmu/amdsmu.c
372

I do not see a need in explicit use of pmap_mapdev(). The bus_space* is good enough for this.

Did you looked what is the value of metrics_addr, and where is it located in the memory map?

sys/dev/amdsmu/amdsmu.h
60 ↗(On Diff #159071)

It is being written to explicitly by bus_space_read_region(), I missed iit.

So the structure is hw-defined? If yes, it must go into amdsmu_reg.h.

Move metrics struct to amdsmu_reg.h.

This revision is now accepted and ready to land.Sat, Jul 26, 7:27 PM
obiwac added inline comments.
sys/dev/amdsmu/amdsmu.c
372

Yes, it isn't within any BAR regions and is just above the last ram0 memory address, so I'm pretty certain this is in DRAM and reserved by firmware (though I don't know how to 100% check this actually).

sys/dev/amdsmu/amdsmu.h
60 ↗(On Diff #159071)

Yeah sorry I meant the member itself is not explicitly being written to.

mckusick added a subscriber: mckusick.

Review process looks good and complete.
Adding my accept as your mentor.

This revision was automatically updated to reflect the committed changes.
obiwac marked an inline comment as done.