Page MenuHomeFreeBSD

Don't reset memory attributes when mapping physical addresses for ACPI.
Needs ReviewPublic

Authored by jhb on May 20 2019, 10:34 PM.

Details

Reviewers
kib
markj
alc
Summary

Previously, AcpiOsMemory was using pmap_mapbios which would always map
the requested address Write-Back (WB). For several AMD Ryzen laptops,
the BIOS uses AcpiOsMemory to directly access the PCI MCFG region in
order to access PCI config registers. This has the side effect of
remapping the MCFG region in the direct map as WB instead of UC
hanging the laptops during boot.

On the one laptop I examined in detail, the _PIC global method used to
switch from 8259A PICs to I/O APICs uses a pair of PCI config space
registers at offset 0x84 in the device at 0:0:0 to as a pair of
address/data registers to access an indirect register in the chipset
and clear a single bit to switch modes.

To fix, alter the semantics of pmap_mapbios() such that it does not
modify the attributes of any existing mappings and instead
uses the existing attributes. If a new mapping is created, this new
mapping uses WB (the default memory attribute).

PR: 231760

Test Plan
  • booted a simpler variant of this on two different AMD ryzen laptops that previously hung on boot in the hacker lounge at BSDCan

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 25293
Build 23956: arc lint + arc unit

Event Timeline

jhb created this revision.May 20 2019, 10:34 PM
jhb added a comment.May 20 2019, 10:41 PM

This isn't a final review yet in the sense that it needs more work before it can be committed. I wanted to get some thoughts though on how best to do this. Specifically, we could perhaps repurpose pmap_mapbios() and give it the semantics of pmap_mapphys() either by changing the implementation of pmap_mapbios() to have the semantics of the new thing (don't force WB, just map whatever is there). Alternatively, we could retire pmap_mapbios() and always use the new name if we think the new name is a better one in that case.

Another approach might be to have a special "VM_MEMATTR_DEFAULT" that is a separate numerical value which means "use whatever memattr is already set, and if creating a new mapping, use the system-wide default" and have pmap_mapbios use that instead of needing the separate SETATTR flag.

(BTW, riscv's pmap has a pmap_mapbios probably copied from arm64 even though riscv doesn't currently support ACPI)

markj added a reviewer: alc.May 21 2019, 1:00 AM
kib added a comment.May 21 2019, 6:23 AM

Is it too early for that problematic calls to pmap_mapdev_attr() to notice that the request is really to MCFG window and redirect the call to pmap_mapdev_pciecfg() ?

jhb added a comment.May 21 2019, 4:03 PM
In D20327#438343, @kib wrote:

Is it too early for that problematic calls to pmap_mapdev_attr() to notice that the request is really to MCFG window and redirect the call to pmap_mapdev_pciecfg() ?

Well, I'm also worried about BIOS writers reading registers from BARs directly using SystemMemory, etc. For the purposes of what AcpiOsMapMemory does, I think it doesn't really want to communicate a hard memattr of WB, I think it just wants to map it with whatever memattr might already be set.

kib added a comment.May 21 2019, 5:16 PM
In D20327#438439, @jhb wrote:
In D20327#438343, @kib wrote:

Is it too early for that problematic calls to pmap_mapdev_attr() to notice that the request is really to MCFG window and redirect the call to pmap_mapdev_pciecfg() ?

Well, I'm also worried about BIOS writers reading registers from BARs directly using SystemMemory, etc. For the purposes of what AcpiOsMapMemory does, I think it doesn't really want to communicate a hard memattr of WB, I think it just wants to map it with whatever memattr might already be set.

But to access SystemMemory address space, wouldn't ACPICA call AcpiOSMapMemory or AcpiOSRead/WriteMemory, which would still result in pmap_mapbios() ? And of course all remapping calls for MCFG should be denied, similarly how we avoid flushing cache for APIC page.

I wonder if this is what we need on aarch64 for the Ampere eMAG server to not panic in ACPI: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237055#c40

jhb added a comment.May 21 2019, 6:44 PM

What happens on the laptops today is that it calls AcpiOsMapMemory for the MCFG region which uses pmap_mapbios and switches the mapping in the direct map from UC to WB and flushes the cache which then hangs. What patch is changing is making AcpiOsMapMemory just leave the direct map as-is and if the requested address has been marked UC prior to the use of AcpiOsMapMemory, it just leaves it alone instead of changing it to WB. I am actually inclined to do this for all the pmap_mapbios calls by having them always honor existing attributes which was my first question above. A further question at that point is if we'd rather just use the new name always since 'mapbios' is kind of a crummy name. Originally 'mapbios' meant 'map a table provided by the BIOS', but given that AML is free to map whatever random thing it chooses, 'mapphys' is probably a better name (and more generic for platforms that don't have BIOS).

kib added a comment.May 21 2019, 7:07 PM
In D20327#438503, @jhb wrote:

What happens on the laptops today is that it calls AcpiOsMapMemory for the MCFG region which uses pmap_mapbios and switches the mapping in the direct map from UC to WB and flushes the cache which then hangs. What patch is changing is making AcpiOsMapMemory just leave the direct map as-is and if the requested address has been marked UC prior to the use of AcpiOsMapMemory, it just leaves it alone instead of changing it to WB. I am actually inclined to do this for all the pmap_mapbios calls by having them always honor existing attributes which was my first question above. A further question at that point is if we'd rather just use the new name always since 'mapbios' is kind of a crummy name. Originally 'mapbios' meant 'map a table provided by the BIOS', but given that AML is free to map whatever random thing it chooses, 'mapphys' is probably a better name (and more generic for platforms that don't have BIOS).

Yes, I both do not like that pmap_mapbios() does not change the attributes for random requests ( I believe that GPU apertures are mapped by pmap_mapbios()), and that it could be done more explicit, snce we actually care about MFCG. Let me provide more or less accurate sketch of what I mean. The code duplication can be optimized out, I am more about the approach.

diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
index 89ba9da19d8..8cbe126012d 100644
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -7616,45 +7616,63 @@ pmap_mapdev_internal(vm_paddr_t pa, vm_size_t size, int mode, bool noflush)
 			panic("%s: Couldn't allocate KVA", __func__);
 	}
 	for (tmpsize = 0; tmpsize < size; tmpsize += PAGE_SIZE)
 		pmap_kenter_attr(va + tmpsize, pa + tmpsize, mode);
 	pmap_invalidate_range(kernel_pmap, va, va + tmpsize);
 	if (!noflush)
 		pmap_invalidate_cache_range(va, va + tmpsize);
 	return ((void *)(va + offset));
 }
 
+static bool pmap_mcfg_inited = false;
+static vm_paddr_t pmap_mcfg_base;
+static vm_size_t pmap_mcfg_size;
+
 void *
 pmap_mapdev_attr(vm_paddr_t pa, vm_size_t size, int mode)
 {
 
+	if (pmap_mcfg_inited && pa >= pmap_mcfg_base &&
+	    pa + size <= pmap_mcfg_base + pmap_mcfg_size)
+		return (pmap_mapdev_pciecfg(pa, size));
 	return (pmap_mapdev_internal(pa, size, mode, false));
 }
 
 void *
 pmap_mapdev(vm_paddr_t pa, vm_size_t size)
 {
 
+	if (pmap_mcfg_inited && pa >= pmap_mcfg_base &&
+	    pa + size <= pmap_mcfg_base + pmap_mcfg_size)
+		return (pmap_mapdev_pciecfg(pa, size));
 	return (pmap_mapdev_internal(pa, size, PAT_UNCACHEABLE, false));
 }
 
 void *
 pmap_mapdev_pciecfg(vm_paddr_t pa, vm_size_t size)
 {
 
+	if (!pmap_mcfg_inited) {
+		pmap_mcfg_inited = true;
+		pmap_mcfg_base = pa;
+		pmap_mcfg_size = size;
+	}
 	return (pmap_mapdev_internal(pa, size, PAT_UNCACHEABLE, true));
 }
 
 void *
 pmap_mapbios(vm_paddr_t pa, vm_size_t size)
 {
 
+	if (pmap_mcfg_inited && pa >= pmap_mcfg_base &&
+	    pa + size <= pmap_mcfg_base + pmap_mcfg_size)
+		return (pmap_mapdev_pciecfg(pa, size));
 	return (pmap_mapdev_internal(pa, size, PAT_WRITE_BACK, false));
 }
 
 void
 pmap_unmapdev(vm_offset_t va, vm_size_t size)
 {
 	struct pmap_preinit_mapping *ppim;
 	vm_offset_t offset;
 	int i;
jhb added a comment.May 21 2019, 9:33 PM
In D20327#438513, @kib wrote:
In D20327#438503, @jhb wrote:

What happens on the laptops today is that it calls AcpiOsMapMemory for the MCFG region which uses pmap_mapbios and switches the mapping in the direct map from UC to WB and flushes the cache which then hangs. What patch is changing is making AcpiOsMapMemory just leave the direct map as-is and if the requested address has been marked UC prior to the use of AcpiOsMapMemory, it just leaves it alone instead of changing it to WB. I am actually inclined to do this for all the pmap_mapbios calls by having them always honor existing attributes which was my first question above. A further question at that point is if we'd rather just use the new name always since 'mapbios' is kind of a crummy name. Originally 'mapbios' meant 'map a table provided by the BIOS', but given that AML is free to map whatever random thing it chooses, 'mapphys' is probably a better name (and more generic for platforms that don't have BIOS).

Yes, I both do not like that pmap_mapbios() does not change the attributes for random requests ( I believe that GPU apertures are mapped by pmap_mapbios()), and that it could be done more explicit, snce we actually care about MFCG. Let me provide more or less accurate sketch of what I mean. The code duplication can be optimized out, I am more about the approach.

Hmm, I guess I'd just rather avoid hardcoding this just for MCFG. In theory there can be multiple MCFG address ranges for example. But what I'm really paranoid about is some future bit of AML code that uses SystemMemory to read a register in a BAR. I wouldn't want that to remap the BAR to be WB instead of UC either. I really feel like AcpiOsMapMemory() should not be forcing WB, just using that as the default if a new mapping needs to be created. In the case of GPU apertures, are you saying that pmap_mapbios is being used to force GPU apertures to use WB that were previously mapped using something else? I don't see that in the current calls to pmap_mapbios:

amd64/acpica/acpi_machdep.c:    rsdp = pmap_mapbios(rsdp_ptr, sizeof(ACPI_TABLE_RSDP));
arm64/acpica/acpi_machdep.c:    header = pmap_mapbios(pa, sizeof(ACPI_TABLE_HEADER));
arm64/acpica/acpi_machdep.c:    table = pmap_mapbios(pa, length);
arm64/acpica/acpi_machdep.c:    table = pmap_mapbios(address, sizeof(ACPI_TABLE_HEADER));
arm64/acpica/acpi_machdep.c:    rsdp = pmap_mapbios(rsdp_ptr, sizeof(ACPI_TABLE_RSDP));
compat/x86bios/x86bios.c:       x86bios_ivt = pmap_mapbios(X86BIOS_IVT_BASE, X86BIOS_IVT_SIZE);
dev/acpica/acpi_pxm.c:  cpus = (struct cpu_info *)pmap_mapbios(addr, size);
dev/acpica/Osd/OsdMemory.c:    return (pmap_mapbios((vm_offset_t)PhysicalAddress, Length));
dev/ipmi/ipmi_smbios.c: header = pmap_mapbios(addr, sizeof(struct smbios_eps));
dev/ipmi/ipmi_smbios.c: table = pmap_mapbios(addr, header->length);
dev/ipmi/ipmi_smbios.c: table = pmap_mapbios(header->structure_table_address,
dev/pci/vga_pci.c:              return (pmap_mapbios(VGA_PCI_BIOS_SHADOW_ADDR, *size));
i386/acpica/acpi_machdep.c:     va = pmap_mapbios(0xffff0, 16);
i386/acpica/acpi_machdep.c:     rsdp = pmap_mapbios(rsdp_ptr, sizeof(ACPI_TABLE_RSDP));
x86/acpica/madt.c:      madt = pmap_mapbios(madt_physaddr, madt_length);

Of those, combat/x86/x86bios.c is mapping the real-mode IDT, dev/pci/vga_pci.c is mapping the VGA BIOS ROM at 0xc0000, and the rest are all mapping BIOS tables (smbios, ACPI, etc.) except for OsdMemory.c which is mapping anything AML asks for. I don't see any of these mapping GPU apertures?

kib added a comment.May 22 2019, 9:07 AM
In D20327#438582, @jhb wrote:
In D20327#438513, @kib wrote:
In D20327#438503, @jhb wrote:

What happens on the laptops today is that it calls AcpiOsMapMemory for the MCFG region which uses pmap_mapbios and switches the mapping in the direct map from UC to WB and flushes the cache which then hangs. What patch is changing is making AcpiOsMapMemory just leave the direct map as-is and if the requested address has been marked UC prior to the use of AcpiOsMapMemory, it just leaves it alone instead of changing it to WB. I am actually inclined to do this for all the pmap_mapbios calls by having them always honor existing attributes which was my first question above. A further question at that point is if we'd rather just use the new name always since 'mapbios' is kind of a crummy name. Originally 'mapbios' meant 'map a table provided by the BIOS', but given that AML is free to map whatever random thing it chooses, 'mapphys' is probably a better name (and more generic for platforms that don't have BIOS).

Yes, I both do not like that pmap_mapbios() does not change the attributes for random requests ( I believe that GPU apertures are mapped by pmap_mapbios()), and that it could be done more explicit, snce we actually care about MFCG. Let me provide more or less accurate sketch of what I mean. The code duplication can be optimized out, I am more about the approach.

Hmm, I guess I'd just rather avoid hardcoding this just for MCFG. In theory there can be multiple MCFG address ranges for example. But what I'm really paranoid about is some future bit of AML code that uses SystemMemory to read a register in a BAR. I wouldn't want that to remap the BAR to be WB instead of UC either. I really feel like AcpiOsMapMemory() should not be forcing WB, just using that as the default if a new mapping needs to be created. In the case of GPU apertures, are you saying that pmap_mapbios is being used to force GPU apertures to use WB that were previously mapped using something else? I don't see that in the current calls to pmap_mapbios:

amd64/acpica/acpi_machdep.c:    rsdp = pmap_mapbios(rsdp_ptr, sizeof(ACPI_TABLE_RSDP));
arm64/acpica/acpi_machdep.c:    header = pmap_mapbios(pa, sizeof(ACPI_TABLE_HEADER));
arm64/acpica/acpi_machdep.c:    table = pmap_mapbios(pa, length);
arm64/acpica/acpi_machdep.c:    table = pmap_mapbios(address, sizeof(ACPI_TABLE_HEADER));
arm64/acpica/acpi_machdep.c:    rsdp = pmap_mapbios(rsdp_ptr, sizeof(ACPI_TABLE_RSDP));
compat/x86bios/x86bios.c:       x86bios_ivt = pmap_mapbios(X86BIOS_IVT_BASE, X86BIOS_IVT_SIZE);
dev/acpica/acpi_pxm.c:  cpus = (struct cpu_info *)pmap_mapbios(addr, size);
dev/acpica/Osd/OsdMemory.c:    return (pmap_mapbios((vm_offset_t)PhysicalAddress, Length));
dev/ipmi/ipmi_smbios.c: header = pmap_mapbios(addr, sizeof(struct smbios_eps));
dev/ipmi/ipmi_smbios.c: table = pmap_mapbios(addr, header->length);
dev/ipmi/ipmi_smbios.c: table = pmap_mapbios(header->structure_table_address,
dev/pci/vga_pci.c:              return (pmap_mapbios(VGA_PCI_BIOS_SHADOW_ADDR, *size));
i386/acpica/acpi_machdep.c:     va = pmap_mapbios(0xffff0, 16);
i386/acpica/acpi_machdep.c:     rsdp = pmap_mapbios(rsdp_ptr, sizeof(ACPI_TABLE_RSDP));
x86/acpica/madt.c:      madt = pmap_mapbios(madt_physaddr, madt_length);

Of those, combat/x86/x86bios.c is mapping the real-mode IDT, dev/pci/vga_pci.c is mapping the VGA BIOS ROM at 0xc0000, and the rest are all mapping BIOS tables (smbios, ACPI, etc.) except for OsdMemory.c which is mapping anything AML asks for. I don't see any of these mapping GPU apertures?

drm is removed from src, but in fact I mis-remembered and I used explicit pmap_mapdev_attr() calls.

I believe that D20348 is the right direction, which would simultaneously avoid hard-coding MCFG and still do it in explicit way. For non-UEFI systems, we might make efi_memory_attribute() report something for MCFG. Or we might have some other function e.g. pmap_sys_memory_attr(), which would use efi_memory_attribute() on UEFI, but fall to manually configured regions + SMAP (?) otherwise.

jhb added a comment.Jun 10 2019, 7:14 PM
In D20327#438881, @kib wrote:
In D20327#438582, @jhb wrote:
In D20327#438513, @kib wrote:
In D20327#438503, @jhb wrote:

What happens on the laptops today is that it calls AcpiOsMapMemory for the MCFG region which uses pmap_mapbios and switches the mapping in the direct map from UC to WB and flushes the cache which then hangs. What patch is changing is making AcpiOsMapMemory just leave the direct map as-is and if the requested address has been marked UC prior to the use of AcpiOsMapMemory, it just leaves it alone instead of changing it to WB. I am actually inclined to do this for all the pmap_mapbios calls by having them always honor existing attributes which was my first question above. A further question at that point is if we'd rather just use the new name always since 'mapbios' is kind of a crummy name. Originally 'mapbios' meant 'map a table provided by the BIOS', but given that AML is free to map whatever random thing it chooses, 'mapphys' is probably a better name (and more generic for platforms that don't have BIOS).

Yes, I both do not like that pmap_mapbios() does not change the attributes for random requests ( I believe that GPU apertures are mapped by pmap_mapbios()), and that it could be done more explicit, snce we actually care about MFCG. Let me provide more or less accurate sketch of what I mean. The code duplication can be optimized out, I am more about the approach.

Hmm, I guess I'd just rather avoid hardcoding this just for MCFG. In theory there can be multiple MCFG address ranges for example. But what I'm really paranoid about is some future bit of AML code that uses SystemMemory to read a register in a BAR. I wouldn't want that to remap the BAR to be WB instead of UC either. I really feel like AcpiOsMapMemory() should not be forcing WB, just using that as the default if a new mapping needs to be created. In the case of GPU apertures, are you saying that pmap_mapbios is being used to force GPU apertures to use WB that were previously mapped using something else? I don't see that in the current calls to pmap_mapbios:

amd64/acpica/acpi_machdep.c:    rsdp = pmap_mapbios(rsdp_ptr, sizeof(ACPI_TABLE_RSDP));
arm64/acpica/acpi_machdep.c:    header = pmap_mapbios(pa, sizeof(ACPI_TABLE_HEADER));
arm64/acpica/acpi_machdep.c:    table = pmap_mapbios(pa, length);
arm64/acpica/acpi_machdep.c:    table = pmap_mapbios(address, sizeof(ACPI_TABLE_HEADER));
arm64/acpica/acpi_machdep.c:    rsdp = pmap_mapbios(rsdp_ptr, sizeof(ACPI_TABLE_RSDP));
compat/x86bios/x86bios.c:       x86bios_ivt = pmap_mapbios(X86BIOS_IVT_BASE, X86BIOS_IVT_SIZE);
dev/acpica/acpi_pxm.c:  cpus = (struct cpu_info *)pmap_mapbios(addr, size);
dev/acpica/Osd/OsdMemory.c:    return (pmap_mapbios((vm_offset_t)PhysicalAddress, Length));
dev/ipmi/ipmi_smbios.c: header = pmap_mapbios(addr, sizeof(struct smbios_eps));
dev/ipmi/ipmi_smbios.c: table = pmap_mapbios(addr, header->length);
dev/ipmi/ipmi_smbios.c: table = pmap_mapbios(header->structure_table_address,
dev/pci/vga_pci.c:              return (pmap_mapbios(VGA_PCI_BIOS_SHADOW_ADDR, *size));
i386/acpica/acpi_machdep.c:     va = pmap_mapbios(0xffff0, 16);
i386/acpica/acpi_machdep.c:     rsdp = pmap_mapbios(rsdp_ptr, sizeof(ACPI_TABLE_RSDP));
x86/acpica/madt.c:      madt = pmap_mapbios(madt_physaddr, madt_length);

Of those, combat/x86/x86bios.c is mapping the real-mode IDT, dev/pci/vga_pci.c is mapping the VGA BIOS ROM at 0xc0000, and the rest are all mapping BIOS tables (smbios, ACPI, etc.) except for OsdMemory.c which is mapping anything AML asks for. I don't see any of these mapping GPU apertures?

drm is removed from src, but in fact I mis-remembered and I used explicit pmap_mapdev_attr() calls.
I believe that D20348 is the right direction, which would simultaneously avoid hard-coding MCFG and still do it in explicit way. For non-UEFI systems, we might make efi_memory_attribute() report something for MCFG. Or we might have some other function e.g. pmap_sys_memory_attr(), which would use efi_memory_attribute() on UEFI, but fall to manually configured regions + SMAP (?) otherwise.

While I think we should be pre-populating the direct mappings attributes based on whatever EFI or SMAP, etc. reports, I do think we should still change AcpiOsMapMemory to never change an existing attribute but just use what is there. As we discussed in D20348 I think we should just honor the EFI attributes from the start, and we can force UC mappings if needed for MCFG when parsing the MCFG table. However, we will still need to have some function that AcpiOsMapMemory can use. I think that changing pmap_mapbios to never set an attribute at all is probably how we will still want to do that, and that approach will also work now while we wait for the other approaches to be completed. In terms of the name, I think pmap_mapphys is perhaps a more accurate name than pmap_mapbios as it is used on platforms without any BIOS at all but some other type of firmware.

kib accepted this revision.Jun 10 2019, 8:07 PM
This revision is now accepted and ready to land.Jun 10 2019, 8:07 PM
jhb updated this revision to Diff 59537.Mon, Jul 8, 3:38 PM
  • Rename back to pmap_mapbios.
  • Change pmap_mapbios on i386 to not change attributes.
  • Don't look for an existing page for pre-init mappings.
  • Don't rewrite PTEs for pmap_mapbios of "low" mappings.
This revision now requires review to proceed.Mon, Jul 8, 3:38 PM
jhb edited the summary of this revision. (Show Details)Mon, Jul 8, 4:50 PM

Rather than conflating the renaming of pmap_mapbios with this change, I've just altered the semantics of the existing function for now. I do think a separate followup to rename pmap_mapbios to pmap_mapphys might be nice, but I don't think we need yet another variant of pmap_mapdev.

kib added inline comments.Tue, Jul 9, 9:21 PM
sys/amd64/amd64/pmap.c
7235–7240

!= 0

sys/i386/i386/pmap.c
301

Start with 1. I do not think we need compat with amd64.

5411–5413

I think this condition should be added to the condition on line 5412. We can only use low PDE when we are fine with the existing attrs, otherwise we must remap.

Also style: == 0.

5446

I think we still do not initialize all pages in vm_page_array. Noted just in case this would cause a problem.

jhb marked 2 inline comments as done.Thu, Jul 11, 12:28 AM
jhb added inline comments.
sys/amd64/amd64/pmap.c
7235–7240

I really do not like that particular interpretation of that rule for single-bit flags. Testing a single-bit flag always looks like a boolean to me.

sys/i386/i386/pmap.c
301

Ok. In theory if anyone tried to boot an i386 kernel on the affected AMD systems they'd need the flush cache hack.

Hmm, actually, no. The i386 cfgreg code uses pmap_kenter instead of pmap_mapdev so it doesn't do a cache flush.

5411–5413

Hmm, I think we want to change the low pmap if SETATTR is set. This is important for the rule that all PTEs for a given PA must use the same attribute.

5446

Is there a way to determine if a vm_page_t is valid?

jhb marked 2 inline comments as done.Thu, Jul 11, 12:34 AM
jhb updated this revision to Diff 59623.
  • Style fixes.
kib added inline comments.Thu, Jul 11, 6:34 PM
sys/i386/i386/pmap.c
5411–5413

I agree. I think this can be postponed.

5446

You might compare VM_PAGE_TO_PHYS(m) with pa.

I have WIP for making x86/iommu used by bhyve, where I have to initialize whole vm_page_array.