Page MenuHomeFreeBSD

linux/io: handle memtype_wc mapping for !DMAP range
ClosedPublic

Authored by mhorne on Tue, May 12, 2:49 PM.
Tags
None
Referenced Files
F157160033: D56971.diff
Mon, May 18, 9:35 PM
F157114617: D56971.diff
Mon, May 18, 1:20 PM
F157114603: D56971.diff
Mon, May 18, 1:20 PM
F157035131: D56971.diff
Mon, May 18, 1:42 AM
Unknown Object (File)
Sun, May 17, 11:11 PM
Unknown Object (File)
Sun, May 17, 5:42 PM
Unknown Object (File)
Sun, May 17, 4:10 PM
Unknown Object (File)
Sun, May 17, 4:09 PM

Details

Summary

The amdgpu driver in drm-kmod will attempt to update/reserve certain GPU
VRAM ranges as write-combining. Depending on the system, this address
range may fall outside of FreeBSD's constructed DMAP. We cannot use
pmap_change_attr() in this case.

When INVARIANTS is enabled, this results in the following:

panic: physical address 0x880000000 not covered by the DMAP

Add a guard against triggering the KASSERT in PHYS_TO_DMAP().

This limitation in our implementation of arch_io_reserve_memtype_wc() is
already known in drm-kmod's amdgpu_bo_init(), and errors are ignored
there (see "BSDFIXME"). This change is only to eliminate the preventable
assertion failure within this scheme.

Reported by: kevans (https://github.com/freebsd/drm-kmod/issues/436)

My system also has this symptom.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Do you need to check that the whole requested range fits into DMAP?

In D56971#1305391, @kib wrote:

Do you need to check that the whole requested range fits into DMAP?

I suspect the range requested will almost always be entirely outside of DMAP, but it's probably worth being cautious.

FWIW- this does unblock part of my amdgpu problem. I debugged the other half, and I'll post that bit back to the drm-kmod bug. Thanks!

Presumably if the range is only partially outside DMAP we'll just fall back to the same assertion we encounter now, but it does seem more correct to check the whole range.

On amd64 it is certainly enough to check PHYS_IN_DMAP(start+size), since DMAP always goes from 0 to TOP. Not sure about other arches.

This revision is now accepted and ready to land.Wed, May 13, 8:19 PM

Only tangentially related, have you hit this one with your GPU?

panic: set_pages_wb: numpages 2
cpuid = 6
time = 1778680160
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00fe91a8f0
vpanic() at vpanic+0x149/frame 0xfffffe00fe91aa20
panic() at panic+0x43/frame 0xfffffe00fe91aa80
ttm_pool_shrink() at ttm_pool_shrink+0x12f/frame 0xfffffe00fe91aab0
ttm_pool_shrinker_scan() at ttm_pool_shrinker_scan+0x18/frame 0xfffffe00fe91aae0
linuxkpi_vm_lowmem() at linuxkpi_vm_lowmem+0x7d/frame 0xfffffe00fe91ab30
vm_pageout_worker() at vm_pageout_worker+0x396/frame 0xfffffe00fe91aeb0
vm_pageout() at vm_pageout+0x1d7/frame 0xfffffe00fe91aef0
fork_exit() at fork_exit+0x82/frame 0xfffffe00fe91af30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00fe91af30

The ttm_pool bits seem to setup pools at every pagesize on the system and we very periodically have to actually purge something from them. That seems to be the *only* place we use set_pages_wb (at least in drm-kmod) and I'm not sure I see anything obvious that would prevent the use of the psind > 0 pools.

Only tangentially related, have you hit this one with your GPU?

panic: set_pages_wb: numpages 2
cpuid = 6
time = 1778680160
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00fe91a8f0
vpanic() at vpanic+0x149/frame 0xfffffe00fe91aa20
panic() at panic+0x43/frame 0xfffffe00fe91aa80
ttm_pool_shrink() at ttm_pool_shrink+0x12f/frame 0xfffffe00fe91aab0
ttm_pool_shrinker_scan() at ttm_pool_shrinker_scan+0x18/frame 0xfffffe00fe91aae0
linuxkpi_vm_lowmem() at linuxkpi_vm_lowmem+0x7d/frame 0xfffffe00fe91ab30
vm_pageout_worker() at vm_pageout_worker+0x396/frame 0xfffffe00fe91aeb0
vm_pageout() at vm_pageout+0x1d7/frame 0xfffffe00fe91aef0
fork_exit() at fork_exit+0x82/frame 0xfffffe00fe91af30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00fe91af30

The ttm_pool bits seem to setup pools at every pagesize on the system and we very periodically have to actually purge something from them. That seems to be the *only* place we use set_pages_wb (at least in drm-kmod) and I'm not sure I see anything obvious that would prevent the use of the psind > 0 pools.

I haven't, although I've only done the minimal testing that my amdgpu system is bootable with INVARIANTS. Perhaps if I start running GENERIC daily I will see it.

I don't know what to make of this code, but something seems inconsistent between passing a value of 1 << order, but the function names this argument numpages.

Only tangentially related, have you hit this one with your GPU?

panic: set_pages_wb: numpages 2
cpuid = 6
time = 1778680160
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00fe91a8f0
vpanic() at vpanic+0x149/frame 0xfffffe00fe91aa20
panic() at panic+0x43/frame 0xfffffe00fe91aa80
ttm_pool_shrink() at ttm_pool_shrink+0x12f/frame 0xfffffe00fe91aab0
ttm_pool_shrinker_scan() at ttm_pool_shrinker_scan+0x18/frame 0xfffffe00fe91aae0
linuxkpi_vm_lowmem() at linuxkpi_vm_lowmem+0x7d/frame 0xfffffe00fe91ab30
vm_pageout_worker() at vm_pageout_worker+0x396/frame 0xfffffe00fe91aeb0
vm_pageout() at vm_pageout+0x1d7/frame 0xfffffe00fe91aef0
fork_exit() at fork_exit+0x82/frame 0xfffffe00fe91af30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00fe91af30

The ttm_pool bits seem to setup pools at every pagesize on the system and we very periodically have to actually purge something from them. That seems to be the *only* place we use set_pages_wb (at least in drm-kmod) and I'm not sure I see anything obvious that would prevent the use of the psind > 0 pools.

I haven't, although I've only done the minimal testing that my amdgpu system is bootable with INVARIANTS. Perhaps if I start running GENERIC daily I will see it.

I don't know what to make of this code, but something seems inconsistent between passing a value of 1 << order, but the function names this argument numpages.

Right, the naming convention seems to have been lifted from Linux itself; order ends up basically being the number of paired pages or something to that effect. Looking closer, I'm not sure I see why set_pages_wb wouldn't be able to just do set_memory_wb(page_to_virt(page), numpages); and handle both cases.

The below patch seems to fix my laptop's stability, but I don't know if page_to_virt(page) is actually the correct thing here. I would imagine we're always talking about slices of kernel memory for these functions, so it's probably fine?

index f328fcabd243..9eeb34a3c858 100644
--- a/sys/compat/linuxkpi/common/include/asm/set_memory.h
+++ b/sys/compat/linuxkpi/common/include/asm/set_memory.h
@@ -29,6 +29,7 @@
 #ifndef _LINUXKPI_ASM_SET_MEMORY_H_
 #define        _LINUXKPI_ASM_SET_MEMORY_H_
 
+#include <linux/mm.h>
 #include <linux/page.h>
 
 static inline int
@@ -65,32 +66,19 @@ set_memory_wb(unsigned long addr, int numpages)
 static inline int
 set_pages_uc(struct page *page, int numpages)
 {
-       KASSERT(numpages == 1, ("%s: numpages %d", __func__, numpages));
-
-       pmap_page_set_memattr(page, VM_MEMATTR_UNCACHEABLE);
-       return (0);
+       return (set_memory_uc((unsigned long)page_to_virt(page), numpages));
 }
 
 static inline int
 set_pages_wc(struct page *page, int numpages)
 {
-       KASSERT(numpages == 1, ("%s: numpages %d", __func__, numpages));
-
-#ifdef VM_MEMATTR_WRITE_COMBINING
-       pmap_page_set_memattr(page, VM_MEMATTR_WRITE_COMBINING);
-#else
-       return (set_pages_uc(page, numpages));
-#endif
-       return (0);
+       return (set_memory_wc((unsigned long)page_to_virt(page), numpages));
 }
 
 static inline int
 set_pages_wb(struct page *page, int numpages)
 {
-       KASSERT(numpages == 1, ("%s: numpages %d", __func__, numpages));
-
-       pmap_page_set_memattr(page, VM_MEMATTR_WRITE_BACK);
-       return (0);
+       return (set_memory_wb((unsigned long)page_to_virt(page), numpages));
 }
 
 static inline int