Page MenuHomeFreeBSD

arm64: Print per-CPU cache summary
ClosedPublic

Authored by jhibbits on May 31 2022, 5:07 PM.
Tags
None
Referenced Files
F133259011: D35366.id106536.diff
Fri, Oct 24, 10:18 AM
Unknown Object (File)
Tue, Oct 21, 8:04 AM
Unknown Object (File)
Sun, Oct 19, 9:00 PM
Unknown Object (File)
Sat, Oct 18, 8:28 PM
Unknown Object (File)
Tue, Oct 14, 10:42 AM
Unknown Object (File)
Mon, Oct 13, 6:14 AM
Unknown Object (File)
Sat, Oct 11, 8:31 AM
Unknown Object (File)
Sat, Oct 11, 8:31 AM

Details

Summary

It can be useful to see a summary of CPU caches on bootup. This is done
for most platforms already, so add this to arm64, in the form of (taken
from Apple M1 pro test):

L1 cache: 192KB (instruction), 128KB (data)
L2 cache: 12288KB (unified)

This is printed out per-CPU, only under bootverbose.

Future refinements could instead determine if a cache level is shared
with other cores (L2 is shared among cores on some SoCs, for instance),
and perform a better calculation to the full true cache sizes. For
instance, it's known that the M1 pro, on which this test was done, has 2
12MB L2 clusters, for a total of 24MB. Seeing each CPU with 12288KB L2
would make one think that there's 12MB * NCPUs, for possibly 120MB
cache, which is incorrect.

Sponsored by: Juniper Networks, Inc.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 45809
Build 42697: arc lint + arc unit

Event Timeline

It would be nice if the cache was the same on all cores, unfortunatly with big.LITTLE this isn't always the case. Some SoCs can have different cache sizes on different cores, e.g. rk3399 seems to have a 32k l1i, 32k l1d, 512k l2 on the small cores, and 48k l1i, 32k l1d, 1m l2 on the big cores.

It would be nice if the cache was the same on all cores, unfortunatly with big.LITTLE this isn't always the case. Some SoCs can have different cache sizes on different cores, e.g. rk3399 seems to have a 32k l1i, 32k l1d, 512k l2 on the small cores, and 48k l1i, 32k l1d, 1m l2 on the big cores.

Understood and agreed. The last statements in the commit message allude to this as being a future enhancement, but I can go ahead and make the changes now.

Make the printout per-CPU instead of once.

Especially if this will be per-CPU, it should probably be more compact. A \n per cache level seems like a lot of dmesg spam.

In D35366#801989, @greg_unrelenting.technology wrote:

Especially if this will be per-CPU, it should probably be more compact. A \n per cache level seems like a lot of dmesg spam.

It was supposed to be guarded under bootverbose, but I guess it got lost in the shuffle.

Is it better under bootverbose as it is, or compact and always printed?

If we print it for all CPUs it should be under bootverbose, if it's only when they are different on different cores it can be on always.

sys/arm64/arm64/identcpu.c
1997

You should use the saved copy of this.

2006

The cacheline sizes can differ between cores in broken SoCs (there has been at least one released with different cacheline sizes).

2023

Can you create macros for these magic numbers.

2031

Do you still need to write to csselr_el1?

jhibbits edited the summary of this revision. (Show Details)

Update commit message.

sys/arm64/arm64/identcpu.c
2021

Why not pass in the existing sbuf from the caller?

2247

clidr & CLIDR_CTYPE_MASK

2250

Can you create the CSSELR macros so we can do CSSELR_Level(i) | CSSELR_IND

sys/arm64/include/armreg.h
73

You don't need the _EL1 in these macros. It would also be usefile to name them _MASK and _SHIFT to be consistant with other registers and to group the mask and shift macros so we have:

#define CCSIDR_NumSets64_MASK ...
#define CCSIDR_NumSets64_SHIFT ...
#define CCSIDR_NumSets_MASK ...
#define CCSIDR_NumSets_SHIFT ...
...
74

I think this should be 0x00FFFFFF00000000

This revision is now accepted and ready to land.Jun 6 2022, 2:27 PM
This revision was automatically updated to reflect the committed changes.