Page MenuHomeFreeBSD

pmap: Avoid clearing the accessed bit for wired mappings
Needs ReviewPublic

Authored by markj on Mar 21 2025, 2:09 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jul 31, 11:57 PM
Unknown Object (File)
Thu, Jul 31, 9:21 PM
Unknown Object (File)
Thu, Jul 31, 9:35 AM
Unknown Object (File)
Thu, Jul 31, 2:20 AM
Unknown Object (File)
Wed, Jul 30, 11:23 AM
Unknown Object (File)
Wed, Jul 30, 5:48 AM
Unknown Object (File)
Tue, Jul 29, 3:33 AM
Unknown Object (File)
Sat, Jul 26, 9:44 PM
Subscribers

Details

Reviewers
alc
kib
andrew
manu
Summary

Currently pmap_advise() does not handle wired mappings specially in the
4KB mapping case: it will clear the accessed bit even though wired pages
cannot be reclaimed. This is mostly harmless in principle, but it can
violate an invariant in pmap_demote_pde_locked(), which assumes that a
superpage mapping with the accessed bit clear must not be wired. When
it encounters such a superpage, the mapping is deleted instead of being
demoted, but this is not permitted for wired mappings.

Since pmap_advise() can reasonably avoid clearing the accessed bit for
wired mappings, let's do that instead of adding additional complexity to
the promotion and demotion paths. For instance, one alternate solution
would be to have pmap_promote_pde() always set PG_A on a wired superpage
mapping, even when constituent PTEs do not all have it set, but I'm not
sure that that's preferable.

I believe the same problem exists on arm64 and will update the diff once
there's consensus on the right solution.

Reported by: syzbot+4b9dad11826c30bb6745@syzkaller.appspotmail.com

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 65312
Build 62195: arc lint + arc unit

Event Timeline

markj requested review of this revision.Mar 21 2025, 2:09 PM
kib added inline comments.
sys/amd64/amd64/pmap.c
9308

I suspect that the new code would be clearer if you slightly restructure it:

   			if ((*pte & (PG_MANAGED | PG_V)) != (PG_MANAGED | PG_V))
				goto maybe_invlrng;
			else if ((*pte & (PG_M | PG_RW)) == (PG_M | PG_RW)) {
				if (advice == MADV_DONTNEED) {
					/*
					 * Future calls to pmap_is_modified()
					 * can be avoided by making the page
					 * dirty now.
					 */
					m = PHYS_TO_VM_PAGE(*pte & PG_FRAME);
					vm_page_dirty(m);
				}
			} else if ((*pte & PG_A) == 0)
				goto maybe_invlrng;
                        atomic_clear_long(pte, PG_M | ((*pte & PG_W) == 0 ? PG_A : 0));
This revision is now accepted and ready to land.Mar 21 2025, 11:17 PM

Simplify according to kib's suggestion.

This revision now requires review to proceed.Mar 23 2025, 2:40 AM
kib added inline comments.
sys/amd64/amd64/pmap.c
9299

IMO the comment should be moved right above the new atomic().

This revision is now accepted and ready to land.Mar 23 2025, 6:42 AM
markj marked 2 inline comments as done.

Move the comment.

This revision now requires review to proceed.Mar 23 2025, 9:40 AM
This revision is now accepted and ready to land.Mar 23 2025, 10:28 AM

@alc did you have any thoughts on this patch?

@alc did you have any thoughts on this patch?

Thinking...

A more direct approach would be to change pmap_demote_pde_locked() to handle wired mappings when the PDE was never accessed:

diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
index 6d1c2d70d8c0..97ff9c67e8d5 100644
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -6104,9 +6104,7 @@ pmap_demote_pde_locked(pmap_t pmap, pd_entry_t *pde, vm_offset_t va,
         * Invalidate the 2MB page mapping and return "failure" if the
         * mapping was never accessed.
         */
-       if ((oldpde & PG_A) == 0) {
-               KASSERT((oldpde & PG_W) == 0,
-                   ("pmap_demote_pde: a wired mapping is missing PG_A"));
+       if ((oldpde & (PG_W | PG_A)) == 0) {
                pmap_demote_pde_abort(pmap, va, pde, oldpde, lockp);
                return (false);
        }
@@ -6164,7 +6162,7 @@ pmap_demote_pde_locked(pmap_t pmap, pd_entry_t *pde, vm_offset_t va,
         * have PG_A set in every PTE, then fill it.  The new PTEs will all
         * have PG_A set.
         */
-       if (!vm_page_all_valid(mpte))
+       if (vm_page_all_valid(mpte) ^ (oldpde & PG_A) != 0)
                pmap_fill_ptp(firstpte, newpte);
 
        pmap_demote_pde_check(firstpte, newpte);

The comments would also need updating, particularly for the second part.

I'm really on the fence about this. I could make a respectable argument for either approach.

In D49442#1163641, @alc wrote:

A more direct approach would be to change pmap_demote_pde_locked() to handle wired mappings when the PDE was never accessed:

diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
index 6d1c2d70d8c0..97ff9c67e8d5 100644
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -6104,9 +6104,7 @@ pmap_demote_pde_locked(pmap_t pmap, pd_entry_t *pde, vm_offset_t va,
         * Invalidate the 2MB page mapping and return "failure" if the
         * mapping was never accessed.
         */
-       if ((oldpde & PG_A) == 0) {
-               KASSERT((oldpde & PG_W) == 0,
-                   ("pmap_demote_pde: a wired mapping is missing PG_A"));
+       if ((oldpde & (PG_W | PG_A)) == 0) {

Just for my understanding, do you mean

if ((oldpde & (PG_W | PG_A)) == PG_W) {

?

        pmap_demote_pde_abort(pmap, va, pde, oldpde, lockp);
        return (false);
}

@@ -6164,7 +6162,7 @@ pmap_demote_pde_locked(pmap_t pmap, pd_entry_t *pde, vm_offset_t va,

    • have PG_A set in every PTE, then fill it. The new PTEs will all
    • have PG_A set. */
  • if (!vm_page_all_valid(mpte))

+ if (vm_page_all_valid(mpte) ^ (oldpde & PG_A) != 0)

        pmap_fill_ptp(firstpte, newpte);
 
pmap_demote_pde_check(firstpte, newpte);
The comments would also need updating, particularly for the second part.

I'm really on the fence about this.  I could make a respectable argument for either approach.
In D49442#1163665, @kib wrote:
In D49442#1163641, @alc wrote:

A more direct approach would be to change pmap_demote_pde_locked() to handle wired mappings when the PDE was never accessed:

diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
index 6d1c2d70d8c0..97ff9c67e8d5 100644
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -6104,9 +6104,7 @@ pmap_demote_pde_locked(pmap_t pmap, pd_entry_t *pde, vm_offset_t va,
         * Invalidate the 2MB page mapping and return "failure" if the
         * mapping was never accessed.
         */
-       if ((oldpde & PG_A) == 0) {
-               KASSERT((oldpde & PG_W) == 0,
-                   ("pmap_demote_pde: a wired mapping is missing PG_A"));
+       if ((oldpde & (PG_W | PG_A)) == 0) {

Just for my understanding, do you mean

if ((oldpde & (PG_W | PG_A)) == PG_W) {

?

No, it is correct as is. If the PDE has the wired bit set, then I don't want to execute the "then" clause, which destroys the mapping. Instead, I want to skip it, and reinstall a page table page with the 4KB granularity wired mappings.

In D49442#1163641, @alc wrote:

I'm really on the fence about this. I could make a respectable argument for either approach.

Perhaps it's reasonable to simply implement both approaches? The patch in review makes some sense on its own, independent of the bug which motivated it: I can't see any practical reason for madvise(DONTNEED or FREE) to have any side effects on wired mappings. On the other hand, it seems sensible to modify the demotion path to handle this case, since we have no rule which says that wired mappings must have PG_A set, so this case could arise again in the future.

Modify the arm64 pmap to avoid clearing ATTR_AF on wired PTEs as well.

This revision now requires review to proceed.Wed, Jul 9, 8:48 PM
In D49442#1163641, @alc wrote:

I'm really on the fence about this. I could make a respectable argument for either approach.

Perhaps it's reasonable to simply implement both approaches? The patch in review makes some sense on its own, independent of the bug which motivated it: I can't see any practical reason for madvise(DONTNEED or FREE) to have any side effects on wired mappings. On the other hand, it seems sensible to modify the demotion path to handle this case, since we have no rule which says that wired mappings must have PG_A set, so this case could arise again in the future.

I looked at vm_map_madvise(), vm_object_madvise(), and vm_page_advise() to see how they handled wired mappings. Overall, I would describe it as inconsistent. Specifically, while pmap_advise() will currently clear the modified and referenced flags on wired PTEs, vm_object_madvise() will leave a wired page's dirty field and PGA_REFERENCED flag untouched. So, if the page already had its dirty field and PGA_REFERENCED flag set before it was wired, an madvise(MADV_{FREE,DONTNEED}) will leave them set.

IMHO, this makes less sense than consistently doing nothing at both the pmap and machine-independent levels or clearing the fields/flags at both levels. To be clear, this could still mean handling the access flags and dirty field/flag differently. For example, we might consistently clear the dirty field/flag, but not touch the access flags. As to which option we choose, I don't feel strongly. I agree with @markj that clearing the accessed flags has little value. What do others think?

sys/amd64/amd64/pmap.c
9254

In the case of MADV_DONTNEED on a wired mapping, if we are not going to clear the accessed flag, then there is no reason to perform the demotion. The only thing that we lose by not demoting is the opportunistic setting of the page's dirty field that occurs below.

I do not see how accessed field on the wired mapping is useful for us at all. From that PoV, doing nothing to PG_A seems to be most logical for all advise ops.