Page MenuHomeFreeBSD

amd64 pmap: eliminate two explicit panics on low mem conditions
ClosedPublic

Authored by kib on Jun 22 2025, 3:44 AM.
Tags
None
Referenced Files
F133583602: D50970.id157553.diff
Sun, Oct 26, 7:58 PM
F133469874: D50970.id157742.diff
Sun, Oct 26, 12:51 AM
Unknown Object (File)
Sat, Oct 25, 10:09 AM
Unknown Object (File)
Sat, Oct 25, 3:11 AM
Unknown Object (File)
Sat, Oct 25, 1:43 AM
Unknown Object (File)
Thu, Oct 23, 4:51 AM
Unknown Object (File)
Thu, Oct 23, 1:26 AM
Unknown Object (File)
Tue, Oct 21, 8:21 AM
Subscribers

Details

Summary
amd64 pmap: do not panic on inability to insert ptp into trie

When pmap_enter_pde() needs to destroy existing kernel superpage
mapping, do not remove saved pt page from the pm_root trie.  Then the
pt page does not need to be re-inserted into the trie.

If the kernel region is not mapped with the superpage, try to insert the
pt page into pm_root trie before clearing ptes.  If failed, we can
return failure without a need to rewind.

Suggested by:   alc


dev/mem: use sx instead of rw lock

Some ops require sleepable context to success, like DMAP demotion.


amd64 pmap: preallocate pt page foir pmap_demote_pdpe() in pmap_demote_DMAP()

Allocate the page outside the kernel_pmap locked region, and pass it to
pmap_demote_pdpe() instead of panicing if VM_ALLOC_INTERRUPT failing.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Jun 22 2025, 3:44 AM
sys/vm/vm_radix.c
140

If parentp == NULL, node is not linked into the tree. I don't see how it ever gets freed, so I think it's a memory leak.

kib marked an inline comment as done.Jun 22 2025, 3:19 PM
kib added inline comments.
sys/vm/vm_radix.c
140

I see, in this case pctrie_insert_lookup_compound() uses existing nul leaf. Am I right?

I changed the interface to vm_radix_insert_prealloc() to take **node, to signal that the *node was not used.

kib marked an inline comment as done.

Change vm_radix_insert_prealloc() to indicate if node was used. Pass &node and NULL it if insertion consumed it.

sys/amd64/amd64/pmap.c
10065

I think this can be reached via mem_range_attr_set(), and because of the rwlock this will be a non-sleepable context.

sys/amd64/amd64/pmap.c
10058

BTW, do you have an explanation why it is safe to ignore the demote requests with len > 1G?

sys/amd64/amd64/pmap.c
10065

It seems that we can change mr_lock to sx. At least all callers are unlocked or in sleepable context.

I looked at the use of it in linuxkpi, and lkpi_arch_phys_wc_del() potentially might be called from the non-sleepable place, the wc_add() sleeps by itself. But I do not see a need in the split if we do MEMRANGE_SET_REMOVE(), since it was already demoted by the SET_UPDATE.

sys/amd64/amd64/pmap.c
10058

I think I see it myself. If some pdp is participating in the mapping of the region which we demote, then it must fully fit into the region, by the assertion that base is aligned at len.

In other words, this is indeed not a generic demotion.

sys/amd64/amd64/pmap.c
7602

This means you could fail in a case where an allocation wasn't needed, where currently you would succeed.

7667

I assume that with this change, this line will never be reached.

Why can't

pmap_abort_ptp(pmap, va, pdpg);
return (KERN_RESOURCE_SHORTAGE);

appear here, instead of doing the early node allocation?

sys/vm/vm_radix.c
140

You are right.

Why not just call vm_radix_node_free(*node)) here, in the parentp==NULL case?

sys/amd64/amd64/pmap.c
7602

Yes. But IMO either conditions are very rare, and have the machine continuing running is more important.

7667

We already cleared page table entries in the if/else block above. If we are going to return failure there, we must restore the mappings, at least for the kernel pmap. The rollback is harder.

sys/vm/vm_radix.c
140

Caller still needs to call node_free(), so it is slightly less code.
Also, the interface is symmetric this way: caller allocated, so it might need to free.

Switch mr_lock to sx.
Minor note in the comment for pmap_demote_PDE().

sys/amd64/amd64/pmap.c
7667
7735

It's possible to have node == NULL here, but uma_zfree_smr(NULL) is not permitted.

10045
10064
10065

It seems to be ok for both functions to sleep. They are probably only used by drm-kmod, and there the functions are called during driver init and deinit.

sys/vm/vm_radix.h
361

Should it be grouped with the other prototypes at the beginning of the file?

kib marked 10 inline comments as done.

Check node for NULL before freeing it.
Reword comments.

sys/amd64/amd64/pmap.c
9666

Here you are handling the case where *mp is null, but why? It cannot happen today, and in general it is probably better for the caller to handle allocation failures.

sys/amd64/amd64/pmap.c
9666

if mp == NULL, we allocate with pmap_alloc_pt_page(), which might fail.

sys/amd64/amd64/pmap.c
9666

I'm talking about the case *mp == NULL, not mp == NULL. If mp != NULL, then I think we must have pdpg != NULL here.

sys/amd64/amd64/pmap.c
9666

Then I do not understand what do you mean.

I have to handle the possibility of pdpg == NULL, and this is all what I do there. pdpg can be NULL if mp is NULL.

I do not see a place where I have a check for *mp == NULL as is.

sys/amd64/amd64/pmap.c
9666

I think the code should be structured like this:

if (mp == NULL) {
    pdpg = pmap_alloc_pt_page(...);
    if (pdpg == NULL)
        return (false);
} else {
    pdpg = *mp;
    ...
}
sys/amd64/amd64/pmap.c
9666

Well, this just tightens up the NULL check, but ok.

Move the pdpg check as suggested.

This revision is now accepted and ready to land.Jun 24 2025, 4:59 PM
sys/amd64/amd64/pmap.c
7600

The most common use case for this function is pmap_enter_object(), and so only about one in 10,000 calls will actually perform the potentially problematic pmap_insert_pt_page(). The unnecessary allocations can easily be avoided by deferring the allocation until you are inside of the subsequent if statement. Put it right before the comment /* Break the existing mapping(s). */. Moreover, condition it on (oldpde & PG_PS) == 0 && va >= VM_MAXUSER_ADDRESS. Once you do this, then the much later call to pmap_insert_pt_page() that deals with wired mappings can revert back to its original behavior that does not assume a preallocated node. Most of the time, that preallocated node was not going to be used anyway, because the PTP is going to be referenced from an existing trie inode.

sys/amd64/amd64/pmap.c
7643

I think that we can be even more selective about node allocation than I suggested in my previous comment. If we are removing a kernel mapping here, then we are actually reinstalling the PTP with invalid mappings. In other words, the PTP was already in the trie. Instead, we could install an invalid PDE here, and leave the PTP in the trie. Then, ... (see below)

7648

If we are about to call this function on the kernel pmap, then beforehand we try performing the existing pmap_insert_pt_page() without any of the proposed changes. In other words, I don't see any reason why we can't attempt to insert the kernel PTP before actually destroying the 4KB granularity mappings in the PTE. Concurrent access to the trie by another thread will be blocked by the pmap lock.

kib edited the summary of this revision. (Show Details)
This revision now requires review to proceed.Jun 24 2025, 11:13 PM
kib marked 3 inline comments as done.Jun 24 2025, 11:13 PM

Correct the remove values.

sys/amd64/amd64/pmap.c
9640

I don't see why the final argument can't simply be a vm_page_t, not a vm_page_t *. When a vm_page_t is given the function is guaranteed to succeed. The caller can just set m to NULL itself.

10083

This can also fail. Unlike the regular kernel address space, we don't have preallocated page table pages for the direct map. In other words, we also need to preallocate a page table page for this demotion, but that only needs to happen if len < NBPDR. I will post suggested changes shortly.

sys/amd64/amd64/pmap.c
10083

I have posted suggested changes in D51091.

kib marked an inline comment as done.

Pass vm_page instead of &vm_page as pt to pmap_demote_pdpe().

sys/amd64/amd64/pmap.c
10063

Should we perform a WARN_SLEEPOK alongside the KASSERTs?

10092–10095

If you move this if statement up inside the prior if statement, then the m = NULL statement can be eliminated.

kib marked 2 inline comments as done.

pmap_demote_PDEP(): Add WITNESS_WARN, contain operations with m into global if.

sys/amd64/amd64/pmap.c
10057
10076
10091–10092
kib marked 8 inline comments as done.

Properly handle freeing of the allocated pt page.

markj added inline comments.
sys/amd64/amd64/pmap.c
7664
This revision is now accepted and ready to land.Jul 1 2025, 2:09 PM
kib marked an inline comment as done.Jul 1 2025, 3:35 PM