Page MenuHomeFreeBSD

riscv: Exclude OpenSBI memory regions when booting with EFI
ClosedPublic

Authored by bnovkov on Tue, Apr 15, 5:12 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, May 1, 3:35 AM
Unknown Object (File)
Thu, Apr 24, 12:06 AM
Unknown Object (File)
Wed, Apr 23, 1:26 PM
Unknown Object (File)
Mon, Apr 21, 7:37 AM
Unknown Object (File)
Sun, Apr 20, 12:06 AM
Unknown Object (File)
Sat, Apr 19, 11:56 PM
Unknown Object (File)
Sat, Apr 19, 11:11 PM
Unknown Object (File)
Sat, Apr 19, 9:41 PM
Subscribers

Details

Summary

OpenSBI uses the first PMP entry to prevent buggy supervisor
software from overwriting the firmware [1]. However, this
region may not be properly marked as reserved in the EFI map, leading
to an access violation exception whenever the kernel
attempts to write to a page from that region.

Fix this by preemptively excluding first EFI memory map entry
if it is marked as "BootServicesData".

[1] https://github.com/riscv-non-isa/riscv-sbi-doc/pull/37

Reported by: tuexen
Tested by: tuexen
Fixes: a2e2178402af

Test Plan

An early boot crash was observed on two different boards (HiFive Unmatched Rev. B and Milk-V Duo S) by both tuexen@ and myself:

Loading kernel...                                                   
[[snip]]
sstatus: 0x8000000200006100
stval  : 0xffffffd000000000
panic: Memory access exception at 0xffffffc0005d71d0: 0xffffffd000000000
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x36
kdb_backtrace() at kdb_backtrace+0x2c
vpanic() at vpanic+0x16e
panic() at panic+0x26
do_trap_supervisor() at do_trap_supervisor+0x108
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x74
--- exception 7, tval = 0xffffffd000000000
memset() at memset+0x12
pmap_zero_page() at pmap_zero_page+0x34
vm_page_alloc_noobj_domain() at vm_page_alloc_noobj_domain+0x240
uma_small_alloc() at uma_small_alloc+0x66
keg_alloc_slab() at keg_alloc_slab+0xb0
zone_import() at zone_import+0xf6
zone_alloc_item() at zone_alloc_item+0x68
zone_ctor() at zone_ctor+0x542
uma_startup1() at uma_startup1+0x184
vm_mem_init() at vm_mem_init+0x3c
mi_startup() at mi_startup+0x1ee
va() at va+0x60
KDB: enter: panic

The first physmem page got allocated by vm_page_alloc_noobj_domain and was marked
with PG_ZERO, leading to a write in a PMP-protected region when pmap_zero_page was invoked.
Both devices boot properly with this patch.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Well this is just wrong, boot services data can be reclaimed once you've exited boot services. Linux seems to work because it always honours the FDT's reserved memory regions, so we should follow that to work around this broken firmware (in fact AFAICT Linux doesn't even bother to parse the table, even on arm64, which is surprising... not sure how ACPI ends up working). And please report a bug against the firmware in question so this can actually get fixed to correctly mark it as EfiRuntimeServicesCode/Data or EfiReservedMemoryType.

So while this is wrong, it's also right. More generally, we can't use BootServices{Code,Data} areas safely.
Yes. 100% the spec says we can. Sure, the spec say that. However, I also see on x86 (the Linux code) where
they don't start using the boot services areas until AFTER they've called SetVirtualAddresses (lots of notes
about buggy firmware). Also of interest, on x86 at least, is that often times there's data needed by different
drivers buried in these sections. Linux does some aweful ugly hacks to split these sections up and keeps
them around as runtime areas to map in (note: we'll need to stop assuming PA == VA, but that's unrelated
to this and also a longer conversation with kib@). Given that Linux does ugly things, it's no wonder that the
bug has gone undetected.

Speaking of ugly things, Linux does parse the EFI memory map table on x86. It's buried deep, but I'm doing
LinuxBoot stuff and have found it. It translates the table early on into the e820 memory format and uses
that everywhere else. It also kinda sorta keeps a shadow copy around (or recreates it, I'm still studying the code)
so that it can pass something to the next kernel, but I'm still in 'unhappy' land about that and how to get
that data and what to pass into the FreeBSD kernel.... The nice order world I though was there turns out
to be a nicely polished turd that stinks if you put your face too close to it.

sys/riscv/riscv/machdep.c
551

This will skip all the leading entries that are BS_DATA, not just the first one.

bnovkov marked an inline comment as done.

Address @imp 's comment and place upper limit on excluded region size.

Well this is just wrong, boot services data can be reclaimed once you've exited boot services. Linux seems to work because it always honours the FDT's reserved memory regions, so we should follow that to work around this broken firmware (in fact AFAICT Linux doesn't even bother to parse the table, even on arm64, which is surprising... not sure how ACPI ends up working). And please report a bug against the firmware in question so this can actually get fixed to correctly mark it as EfiRuntimeServicesCode/Data or EfiReservedMemoryType.

I apologize for the confusion in the initial patch, the intention was to leave out the first entry only (and only if it's marked as BootServicesData).
Note that this patch makes sure that EFI follows the same policy as FDT in fdt_physmem_hardware_region_cb above (i.e., excluding the first 2MB of physical memory).
However, the previous iteration of the patch could've excluded more than 2MB so I've placed an upper limit on the size of the excluded region.

While I agree this is caused by buggy firmware, I don't think we should flat-out refuse to boot. Printing a very loud warning might be better.

sys/riscv/riscv/machdep.c
551

Whoops, looks like I accidentally introduced this during a style(9) pass, thank you!

Just tested the latest version. No problems observed.

This revision is now accepted and ready to land.Wed, Apr 16, 12:32 AM