Page MenuHomeFreeBSD

acpica: Fix SRAT memory allocation and CPU enablement handling
Needs ReviewPublic

Authored by ziaee on Sat, Jun 20, 12:42 AM.
Tags
None
Referenced Files
F161158406: D57692.id.diff
Wed, Jul 1, 2:25 AM
F161148562: D57692.diff
Wed, Jul 1, 12:05 AM
Unknown Object (File)
Mon, Jun 29, 9:36 PM
Unknown Object (File)
Mon, Jun 29, 9:35 PM
Unknown Object (File)
Sat, Jun 27, 4:38 AM
Unknown Object (File)
Fri, Jun 26, 6:12 PM
Unknown Object (File)
Thu, Jun 25, 6:08 AM
Unknown Object (File)
Tue, Jun 23, 9:16 PM
Subscribers

Details

Summary

This commit fixes two boot issues on GCE c4a-metal instances related to
the System Resource Affinity Table (SRAT):

  1. Smart Early Memory Allocation:

During early boot, acpi_pxm_init() allocates memory for the CPU affinity
array by stealing memory from the end of the last physical memory region
in the phys_avail list. On c4a-metal, the last physical memory chunk is
extremely small (less than the required size), causing a KASSERT(addr >=
phys_avail[idx]) panic.
Fix this by walking the phys_avail list backwards to find the first
physical memory region that is actually large enough to satisfy the
allocation, rather than assuming the last one always has space. If no
region has enough space, we fall back to disabling SRAT gracefully.

  1. CPU Enable Flag Handling:

The firmware on c4a-metal lists several CPUs as disabled in the SRAT
table, even though they are enabled in the MADT (APIC) table and start
up successfully. The original SRAT parser skipped disabled entries,
which caused these CPUs to boot without a proximity domain assigned,
leading to weird behavior or panics.
Fix this by removing the early exit in srat_parse_entry() for disabled
SRAT GICC entries. This ensures all CPUs listed in SRAT get assigned to
their defined proximity domain (or domain 0) to match their
MADT-activated state.

Note: The SRAT, DSDT, and APIC dumps are on freebsd.org/~ziaee/tmp/c4a-*

Authored by: Jasper Tran O'Leary <jtranoleary@google.com>
Sponsored by: Google Cloud

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 74120
Build 71003: arc lint + arc unit

Event Timeline

ziaee held this revision as a draft.
ziaee published this revision for review.Sat, Jun 20, 2:29 AM
ziaee changed the visibility from "Public (No Login Required)" to "committers (Project)".
sys/dev/acpica/acpi_pxm.c
318–326

I think we should loudly complain about situations like this since this is a firmware bug.
I'd restore the check but just print out a message instead of breaking.

535–537

Not sure what happened here, I assume they accidentally deleted size.

the idx size typo was my own, sorry! add back the check.

ziaee changed the visibility from "committers (Project)" to "Public (No Login Required)".Tue, Jun 23, 3:56 PM