Page MenuHomeFreeBSD

pmap_enter: always pass PG_M to pmap_enter_pde if PG_RW is set
Needs ReviewPublic

Authored by ashafer on Thu, Feb 26, 4:43 AM.
Tags
None
Referenced Files
F146545372: D55525.id.diff
Tue, Mar 3, 1:14 PM
F146545202: D55525.diff
Tue, Mar 3, 1:12 PM
Unknown Object (File)
Mon, Mar 2, 1:38 PM
Unknown Object (File)
Mon, Mar 2, 7:55 AM
Unknown Object (File)
Sun, Mar 1, 6:11 PM
Unknown Object (File)
Sun, Mar 1, 9:22 AM
Unknown Object (File)
Sun, Mar 1, 6:23 AM
Unknown Object (File)
Sun, Mar 1, 3:32 AM
Subscribers
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

This helps address a panic that is seen when certain DRM features are
exposed:
https://github.com/freebsd/drm-kmod/pull/120/changes/cde2fe679438bd5f4d300a61a907e29497618910

We need to enable these features again in order to provide support
for meteorlake and newer GPUs.

pmap_enter_pde panics on a failed assertion due to PG_M not being set
for the new entry. It seems like the intel driver remaps some memory
regions (remap_io_mapping->lkpi_vmf_insert_pfn_prot_locked) which set up
some number of pages. Later when pmap_enter_pde is called the write
protection is set but PG_M never gets triggered.

This change sets PG_M if the protection contains PG_RW to ensure that
we satisfy the conditions to use pmap_enter_pde.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 71016
Build 67899: arc lint + arc unit

Event Timeline

To clarify, this is my best guess for how to fix this but am very open to suggestions if this is not a good approach. I also confirmed that with this patch meteorlake graphics work properly, it does indeed resolve the panic. From my testing it seems things are fine, this seems like a solid strategy to ensure proper semantics, but I'm open to alternative suggestions for how to fix this. My theory is that normally this path is happening for unmanaged memory local to a specific device, but because the memory in this case is sysmem it is managed and PG_M does not get forced on earlier in this function.

Where the call to pmap_enter() comes from? It sounds as if flags missed VM_PROT_WRITE?

It's coming from vm_fault_populate. Here's the core.txt output: https://pastebin.com/ajLGGr9m

Also, I think the fault is for read access because the fault_flags do not contain write like you mentioned (vm_fault_trap). The fault is originating from the Xorg process although I'm not sure what the userspace stack is.

vm_fault_soft_fast() is careful to create a writable superpage mapping only if all constituent pages are dirty. In that case it will preset PG_M by setting fs->fault_type |= VM_PROT_WRITE. More generally, there is an invariant that writable superpage mappings are dirty, which is what the assertion is complaining about.

vm_fault_populate() does not preset VM_PROT_WRITE, even though it marks all constituent pages dirty at the MI layer (though I'm not 100% sure why it does that unconditionally... what if the mapping is read-only?). I think this is an oversight, so this might be a better solution:

diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index addda72e2b56..e7a806dbe5ec 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -658,6 +658,8 @@ vm_fault_populate(struct faultstate *fs)
                        vm_fault_dirty(fs, &m[i]);
                }
                VM_OBJECT_WUNLOCK(fs->first_object);
+               if (fs->prot & VM_PROT_WRITE)
+                       fs->fault_type |= VM_PROT_WRITE;
                rv = pmap_enter(fs->map->pmap, vaddr, m, fs->prot, fs->fault_type |
                    (fs->wired ? PMAP_ENTER_WIRED : 0), psind);
 

Also, I think the fault is for read access because the fault_flags do not contain write like you mentioned (vm_fault_trap). The fault is originating from the Xorg process although I'm not sure what the userspace stack is.

May be, populate should set the fault type to VM_PROT_WRITE if the mapping has the VM_PROT_WRITE permission? It sounds as a bug in the populate/largepage layer, not in pmap.

P.S. Mark wrote the same above.

vm_fault_soft_fast() is careful to create a writable superpage mapping only if all constituent pages are dirty. In that case it will preset PG_M by setting fs->fault_type |= VM_PROT_WRITE. More generally, there is an invariant that writable superpage mappings are dirty, which is what the assertion is complaining about.

vm_fault_populate() does not preset VM_PROT_WRITE, even though it marks all constituent pages dirty at the MI layer (though I'm not 100% sure why it does that unconditionally... what if the mapping is read-only?). I think this is an oversight, so this might be a better solution:

Ah, I misread vm_fault_dirty() as vm_page_dirty(). I need to think about it more, but the patch below might be closer to being correct:

diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index addda72e2b56..43a67eb024a8 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -642,6 +642,8 @@ vm_fault_populate(struct faultstate *fs)
                pager_last = map_last;
        }
        for (pidx = pager_first; pidx <= pager_last; pidx += npages) {
+               int flags;
+
                m = vm_page_lookup(fs->first_object, pidx);
                vaddr = fs->entry->start + IDX_TO_OFF(pidx) - fs->entry->offset;
                KASSERT(m != NULL && m->pindex == pidx,
@@ -652,12 +654,18 @@ vm_fault_populate(struct faultstate *fs)
                    !pmap_ps_enabled(fs->map->pmap)))
                        psind--;
 
+               flags = PS_ALL_DIRTY;
                npages = atop(pagesizes[psind]);
                for (i = 0; i < npages; i++) {
                        vm_fault_populate_check_page(&m[i]);
                        vm_fault_dirty(fs, &m[i]);
+                       if (m[i].dirty == 0)
+                               flags &= ~PS_ALL_DIRTY;
                }
                VM_OBJECT_WUNLOCK(fs->first_object);
+               if (psind > 0 && (fs->prot & VM_PROT_WRITE) != 0 &&
+                   (flags & PS_ALL_DIRTY) != 0)
+                       fs->fault_type |= VM_PROT_WRITE;
                rv = pmap_enter(fs->map->pmap, vaddr, m, fs->prot, fs->fault_type |
                    (fs->wired ? PMAP_ENTER_WIRED : 0), psind);
 

EDIT: Can confirm that first, more simple vm_fault proposal also works to prevent the panic but the second one does not.

I did still see the panic one time with the latest PS_ALL_DIRTY proposal. I can see that the fault_type is set to 0x1 (i.e. VM_PROT_READ) and a logging message I added does not get printed saying we set VM_PROT_WRITE. Sounds like psind is > 0 and it's a write protection but at least one of the pages is not dirty?