Page MenuHomeFreeBSD

Fixes for some corner cases of amd64 demotions.
ClosedPublic

Authored by kib on May 23 2019, 2:29 PM.

Details

Summary
  1. create_pagetables(): Fully fill the last kernel placeholder page table page. If it is ever demoted, we store wrong content in the pm_root radix.
  2. Ensure that all pre-promoted kernel page tables are inserted into the kernel pmap radix.

3, Correct the test for potentially invalid page table page in pmap_demote_pde_locked().

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 24511

Event Timeline

I think that the changes within pmap_demote_pde_locked(), including the two given in my comments, are correct and should be committed. The same changes should also be made to the i386 pmap.

I'd like to give some more thought to the KERNend versus kern_promoted_end changes.

sys/amd64/amd64/pmap.c
4550–4553
-		if (va < VM_MAXUSER_ADDRESS)
+		if (va < VM_MAXUSER_ADDRESS) {
+			mpte->wire_count = NPTEPG;
 			pmap_resident_count_inc(pmap, 1);
+		}
4566–4567
-	 * If the page table page is new, initialize it.
+	 * If the page table page is not leftover from an earlier promotion,
+	 * initialize it.
sys/amd64/amd64/pmap.c
1760

I think that we should keep KERNend as is and "fix" create_pagetables() instead. By "fix", I mean that we should change create_pagetables() so that it only maps physical memory up to KERNend. Currently, it pointlessly maps the page table memory that it allocates, even though none of these mappings will ever be used. Any accesses to the page table pages allocated in create_pagetables() will ultimately be performed through the direct map.

This revision was not accepted when it landed; it landed in state Needs Review.May 24 2019, 5:19 PM
This revision was automatically updated to reflect the committed changes.
sys/amd64/amd64/pmap.c
1760

Can you please elaborate ?

I do not see how could I not map the page tables into KTmap, due to the nature of recursive mapping. Also, the comments around initialization of initial page tables and directories look very unusual to me.

I _think_ that what you mean is that we should connect all preallocated page tables to directories, but only pre-promote to the rounded KERNend, and store in radix only page tables pages which were shadowed by the manual promotion.

I am uploading patch that reflects this view. It booted and some preloaded data was successfully freed, which involved demotion.

This revision is now accepted and ready to land.May 24 2019, 10:09 PM

I _think_ that what you mean is that we should connect all preallocated page tables to directories, but only pre-promote to the rounded KERNend, and store in radix only page tables pages which were shadowed by the manual promotion.

Yes, except that the way that the loop is written, we don't need to round KERNend for correctness. I would only round it if you think that doing so makes it less likely that someone will misinterpret the code. Here are all of the changes that I would make.

Index: amd64/amd64/pmap.c
===================================================================
--- amd64/amd64/pmap.c  (revision 348267)
+++ amd64/amd64/pmap.c  (working copy)
@@ -1338,7 +1338,6 @@ static void
 create_pagetables(vm_paddr_t *firstaddr)
 {
        int i, j, ndm1g, nkpdpe, nkdmpde;
-       pt_entry_t *pt_p;
        pd_entry_t *pd_p;
        pdp_entry_t *pdp_p;
        pml4_entry_t *p4_p;
@@ -1399,20 +1398,21 @@ create_pagetables(vm_paddr_t *firstaddr)
        KPTphys = allocpages(firstaddr, nkpt);
        KPDphys = allocpages(firstaddr, nkpdpe);
 
-       /* Fill in the underlying page table pages */
-       /* XXX not fully used, underneath 2M pages */
-       pt_p = (pt_entry_t *)KPTphys;
-       for (i = 0; ptoa(i) < *firstaddr; i++)
-               pt_p[i] = ptoa(i) | X86_PG_V | pg_g | bootaddr_rwx(ptoa(i));
-
-       /* Now map the page tables at their location within PTmap */
+       /*
+        * Connect the zero-filled PT pages to their PD entries.  This
+        * implicitly maps the PT pages at their correct locations within
+        * the PTmap.
+        */
        pd_p = (pd_entry_t *)KPDphys;
        for (i = 0; i < nkpt; i++)
                pd_p[i] = (KPTphys + ptoa(i)) | X86_PG_RW | X86_PG_V;
 
-       /* Map from zero to end of allocations under 2M pages */
-       /* This replaces some of the KPTphys entries above */
-       for (i = 0; (i << PDRSHIFT) < *firstaddr; i++)
+       /*
+        * Map from physical address zero to the end of loader preallocated
+        * memory using 2MB pages.  This replaces some of the PD entries
+        * created above.
+        */
+       for (i = 0; (i << PDRSHIFT) < KERNend; i++)
                /* Preset PG_M and PG_A because demotion expects it. */
                pd_p[i] = (i << PDRSHIFT) | X86_PG_V | PG_PS | pg_g |
                    X86_PG_M | X86_PG_A | bootaddr_rwx(i << PDRSHIFT);
@@ -1422,7 +1422,8 @@ create_pagetables(vm_paddr_t *firstaddr)
         * to record the physical blocks we've actually mapped into kernel
         * virtual address space.
         */
-       *firstaddr = round_2mpage(*firstaddr);
+       if (*firstaddr < round_2mpage(KERNend))
+               *firstaddr = round_2mpage(KERNend);
 
        /* And connect up the PD to the PDP (leaving room for L4 pages) */
        pdp_p = (pdp_entry_t *)(KPDPphys + ptoa(KPML4I - KPML4BASE));
@@ -1529,7 +1530,10 @@ pmap_bootstrap(vm_paddr_t *firstaddr)
         */
        vm_phys_add_seg(KPTphys, KPTphys + ptoa(nkpt));
 
-       virtual_avail = (vm_offset_t) KERNBASE + *firstaddr;
+       /*
+        * Account for the virtual addresses mapped by create_pagetables().
+        */
+       virtual_avail = (vm_offset_t)KERNBASE + round_2mpage(KERNend);
        virtual_end = VM_MAX_KERNEL_ADDRESS;
 
        /*
In D20380#440605, @alc wrote:

I _think_ that what you mean is that we should connect all preallocated page tables to directories, but only pre-promote to the rounded KERNend, and store in radix only page tables pages which were shadowed by the manual promotion.

Yes, except that the way that the loop is written, we don't need to round KERNend for correctness. I would only round it if you think that doing so makes it less likely that someone will misinterpret the code. Here are all of the changes that I would make.

I do not think that the check needs changing.
Also I see that indeed page table pages do not need non-zero initialization after we switched to PG_PROMOTED test.

I am not sure why do check that KERNend != round_2mpage() before rounding, IMO it simply less lines to not do that.

alc' version of the patch, with one minor style fix.
Testing this properly almost certainly requires D20266.

In D20380#440609, @kib wrote:
In D20380#440605, @alc wrote:

I _think_ that what you mean is that we should connect all preallocated page tables to directories, but only pre-promote to the rounded KERNend, and store in radix only page tables pages which were shadowed by the manual promotion.

Yes, except that the way that the loop is written, we don't need to round KERNend for correctness. I would only round it if you think that doing so makes it less likely that someone will misinterpret the code. Here are all of the changes that I would make.

I do not think that the check needs changing.
Also I see that indeed page table pages do not need non-zero initialization after we switched to PG_PROMOTED test.

I am not sure why do check that KERNend != round_2mpage() before rounding, IMO it simply less lines to not do that.

Are you asking about this snippet?

if (*firstaddr < round_2mpage(KERNend))
        *firstaddr = round_2mpage(KERNend);
In D20380#440642, @alc wrote:
In D20380#440609, @kib wrote:

I am not sure why do check that KERNend != round_2mpage() before rounding, IMO it simply less lines to not do that.

Are you asking about this snippet?

if (*firstaddr < round_2mpage(KERNend))
        *firstaddr = round_2mpage(KERNend);

Yes.

In D20380#440650, @kib wrote:
In D20380#440642, @alc wrote:
In D20380#440609, @kib wrote:

I am not sure why do check that KERNend != round_2mpage() before rounding, IMO it simply less lines to not do that.

Are you asking about this snippet?

if (*firstaddr < round_2mpage(KERNend))
        *firstaddr = round_2mpage(KERNend);

Yes.

Sometimes the allocpages() calls advance *firstaddr past the end of the last 2MB page mapping. In that case, this conditional avoids wasting an average of 1MB of physical memory. I tested this conditional by tweaking nkpt.

More generally, I'm skeptical of the safety/security benefits of ever rounding *firstaddr, but I'm not going to argue against it today.

Everything here looks good to me.

In D20380#440728, @alc wrote:

Everything here looks good to me.

Does this mean 'clean to commit' ? I think testing with D20266 can wait for D20266 itself.

In D20380#440729, @kib wrote:
In D20380#440728, @alc wrote:

Everything here looks good to me.

Does this mean 'clean to commit' ? I think testing with D20266 can wait for D20266 itself.

Yes.