Page MenuHomeFreeBSD

Rethink r348246
ClosedPublic

Authored by alc on Jun 6 2019, 6:45 PM.

Details

Summary

Both arm64 and riscv have the same problem that we addressed in r348246 on amd64 and i386. However, they do not define a PG_PROMOTED bit in their level 2 PTEs. PG_PROMOTED makes no sense on arm64, where we are required to "break before make". And, on riscv, there are no spare bits in the PTE. (Only two bits are available to the OS, and we have used them both.) Instead, I propose to use the page table page's valid field. And, in fact, I propose to switch amd64 (and i386) to this approach. Aside from my belief that having a single, portable approach across the different architectures is a virtue, when the erratum 383 work-around is enabled on AMD processors, we are currently performing unnecessary page table page initializations on demotion because we don't set PG_PROMOTED in the PDE when the 383 work-around is enabled.

In addition, remove a stale comment.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

alc created this revision.Jun 6 2019, 6:45 PM
alc edited the summary of this revision. (Show Details)Jun 6 2019, 6:47 PM
markj added a comment.Jun 6 2019, 6:49 PM

Could you please include more context for the diff?

kib accepted this revision.Jun 6 2019, 6:59 PM
This revision is now accepted and ready to land.Jun 6 2019, 6:59 PM
markj accepted this revision.Jun 6 2019, 7:07 PM

Note that with this change the comment near the beginning of riscv's pmap_remove_l2() doesn't apply anymore.

alc updated this revision to Diff 58328.Jun 6 2019, 8:55 PM

Add i386. Provide context.

This revision now requires review to proceed.Jun 6 2019, 8:55 PM
alc updated this revision to Diff 58329.Jun 6 2019, 9:00 PM

Correct i386. I forgot the "(mpte->valid == 0)" test.

alc added a comment.Jun 6 2019, 9:04 PM

Note that with this change the comment near the beginning of riscv's pmap_remove_l2() doesn't apply anymore.

This comment?

/*                                                                                                                                                                                                
 * The sfence.vma documentation states that it is sufficient to specify                                                                                                                           
 * a single address within a superpage mapping.  However, since we do                                                                                                                             
 * not perform any invalidation upon promotion, TLBs may still be                                                                                                                                 
 * caching 4KB mappings within the superpage, so we must invalidate the                                                                                                                           
 * entire range.                                                                                                                                                                                  
 */
alc added a comment.Jun 7 2019, 5:21 AM

This approach would allow us to skip the pagezero() call in pmap_remove_kernel_pde() when mpte->valid is zero. For reference, here is the unmodified pmap_remove_kernel_pde():

static void
pmap_remove_kernel_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t va)
{
        pd_entry_t newpde;
        vm_paddr_t mptepa;
        vm_page_t mpte;

        KASSERT(pmap == kernel_pmap, ("pmap %p is not kernel_pmap", pmap));
        PMAP_LOCK_ASSERT(pmap, MA_OWNED);
        mpte = pmap_remove_pt_page(pmap, va);
        if (mpte == NULL)
                panic("pmap_remove_kernel_pde: Missing pt page.");

        mptepa = VM_PAGE_TO_PHYS(mpte);
        newpde = mptepa | X86_PG_M | X86_PG_A | X86_PG_RW | X86_PG_V;

        /*                                                                                                              
         * Initialize the page table page.                                                                              
         */
        pagezero((void *)PHYS_TO_DMAP(mptepa));
markj accepted this revision.Jun 7 2019, 5:04 PM
In D20538#443875, @alc wrote:

Note that with this change the comment near the beginning of riscv's pmap_remove_l2() doesn't apply anymore.

This comment?

/*                                                                                                                                                                                                
 * The sfence.vma documentation states that it is sufficient to specify                                                                                                                           
 * a single address within a superpage mapping.  However, since we do                                                                                                                             
 * not perform any invalidation upon promotion, TLBs may still be                                                                                                                                 
 * caching 4KB mappings within the superpage, so we must invalidate the                                                                                                                           
 * entire range.                                                                                                                                                                                  
 */

Right. Sorry, I was thinking a step ahead. It is still true, but now we can detect whether the mapping was promoted, so it is no longer necessary to unconditionally invalidate the entire range.

This revision is now accepted and ready to land.Jun 7 2019, 5:04 PM
kib accepted this revision.Jun 7 2019, 5:40 PM
alc updated this revision to Diff 58389.Jun 8 2019, 5:15 AM

Conditionally zero the page table page in pmap_remove_kernel_{l2,pde}().

On i386, change the pmap_insert_pt_page() call in pmap_init() to treat the page table page as promoted, because the page table pages contain valid mappings.

Add assertions that the page table page was saved because of a promotion after some pmap_remove_pt_page() calls.

This revision now requires review to proceed.Jun 8 2019, 5:15 AM
alc added a comment.Jun 8 2019, 5:17 AM

I'm finally done with this patch. I have no further changes in mind.

alc updated this revision to Diff 58390.Jun 8 2019, 5:32 AM

Correct a typo in the comment at the head of pmap_insert_pt_page() on riscv.

kib accepted this revision.Jun 8 2019, 11:39 AM
This revision is now accepted and ready to land.Jun 8 2019, 11:39 AM
alc added a comment.Jun 8 2019, 8:36 PM
In D20538#443875, @alc wrote:

Note that with this change the comment near the beginning of riscv's pmap_remove_l2() doesn't apply anymore.

This comment?

/*                                                                                                                                                                                                
 * The sfence.vma documentation states that it is sufficient to specify                                                                                                                           
 * a single address within a superpage mapping.  However, since we do                                                                                                                             
 * not perform any invalidation upon promotion, TLBs may still be                                                                                                                                 
 * caching 4KB mappings within the superpage, so we must invalidate the                                                                                                                           
 * entire range.                                                                                                                                                                                  
 */

Right. Sorry, I was thinking a step ahead. It is still true, but now we can detect whether the mapping was promoted, so it is no longer necessary to unconditionally invalidate the entire range.

I still wish that we had a spare bit to define a PG_PROMOTED flag on riscv, because I think that it is a better way to avoid unnecessary invalidations. In particular, in pmap_protect_pde() on x86, we are able to simultaneously change the protection and clear PG_PROMOTED without computing "mpte" and touching any of its fields.

That said, even better than the current PG_PROMOTED on x86 would be having the ability to recognize when a pmap_invalidate_all() had been performed after the promotion and so any leftover, prepromotion 4KB mappings are already flushed. Any ideas on how to do this?

alc added a comment.Jun 8 2019, 8:48 PM

As an aside, my reading of the demotion code on arm64 is that it leaves the superpage mapping in place when it aborts/fails. In other words, it does not invalidate the mapping. That is not good.

markj added a comment.Jun 9 2019, 2:55 AM
In D20538#444501, @alc wrote:
In D20538#443875, @alc wrote:

Note that with this change the comment near the beginning of riscv's pmap_remove_l2() doesn't apply anymore.

This comment?

/*                                                                                                                                                                                                
 * The sfence.vma documentation states that it is sufficient to specify                                                                                                                           
 * a single address within a superpage mapping.  However, since we do                                                                                                                             
 * not perform any invalidation upon promotion, TLBs may still be                                                                                                                                 
 * caching 4KB mappings within the superpage, so we must invalidate the                                                                                                                           
 * entire range.                                                                                                                                                                                  
 */

Right. Sorry, I was thinking a step ahead. It is still true, but now we can detect whether the mapping was promoted, so it is no longer necessary to unconditionally invalidate the entire range.

I still wish that we had a spare bit to define a PG_PROMOTED flag on riscv, because I think that it is a better way to avoid unnecessary invalidations. In particular, in pmap_protect_pde() on x86, we are able to simultaneously change the protection and clear PG_PROMOTED without computing "mpte" and touching any of its fields.
That said, even better than the current PG_PROMOTED on x86 would be having the ability to recognize when a pmap_invalidate_all() had been performed after the promotion and so any leftover, prepromotion 4KB mappings are already flushed. Any ideas on how to do this?

Perhaps something along the lines of, maintain a per-pmap generation counter that gets incremented when pmap_invalidate_all() is called. Any time a PDE is promoted, the PD page caches the current value in its md_page struct. Then, when invalidating a 2MB superpage, we only need to invalidate the entire range if the PD page's generation is equal to the pmap's and the PDE has PG_PROMOTED set. This can still result in unnecessary invalidations since the PD page's generation is used for up to 512 promoted PDEs. Instead, we could save the generation in the PT page during a promotion, but that has the (small?) downside of requiring a radix lookup in pmap_invalidate_pde_page(). I believe that if PG_PROMOTED is set we must always be able to find the PT page. Is that right?

This revision was automatically updated to reflect the committed changes.
markj added a comment.Jun 9 2019, 3:36 AM

Rather than overloading the valid field, I would prefer to explicitly have a union { vm_page_bits_t valid; vm_page_bits_t promoted; } so that the MI field can be changed without excessive churn if need be. wire_count is similarly overloaded and I had to work around that in D20486.

alc added a comment.Jun 9 2019, 6:17 AM
In D20538#444501, @alc wrote:
In D20538#443875, @alc wrote:

Note that with this change the comment near the beginning of riscv's pmap_remove_l2() doesn't apply anymore.

This comment?

/*                                                                                                                                                                                                
 * The sfence.vma documentation states that it is sufficient to specify                                                                                                                           
 * a single address within a superpage mapping.  However, since we do                                                                                                                             
 * not perform any invalidation upon promotion, TLBs may still be                                                                                                                                 
 * caching 4KB mappings within the superpage, so we must invalidate the                                                                                                                           
 * entire range.                                                                                                                                                                                  
 */

Right. Sorry, I was thinking a step ahead. It is still true, but now we can detect whether the mapping was promoted, so it is no longer necessary to unconditionally invalidate the entire range.

I still wish that we had a spare bit to define a PG_PROMOTED flag on riscv, because I think that it is a better way to avoid unnecessary invalidations. In particular, in pmap_protect_pde() on x86, we are able to simultaneously change the protection and clear PG_PROMOTED without computing "mpte" and touching any of its fields.
That said, even better than the current PG_PROMOTED on x86 would be having the ability to recognize when a pmap_invalidate_all() had been performed after the promotion and so any leftover, prepromotion 4KB mappings are already flushed. Any ideas on how to do this?

Perhaps something along the lines of, maintain a per-pmap generation counter that gets incremented when pmap_invalidate_all() is called. Any time a PDE is promoted, the PD page caches the current value in its md_page struct. Then, when invalidating a 2MB superpage, we only need to invalidate the entire range if the PD page's generation is equal to the pmap's and the PDE has PG_PROMOTED set. This can still result in unnecessary invalidations since the PD page's generation is used for up to 512 promoted PDEs. Instead, we could save the generation in the PT page during a promotion, but that has the (small?) downside of requiring a radix lookup in pmap_invalidate_pde_page(). I believe that if PG_PROMOTED is set we must always be able to find the PT page. Is that right?

Yes.

I believe that pmap_protect_pde() is the only caller to pmap_invalidate_pde_page() that doesn't perform the radix lookup, so the added cost of storing the generation number in the PT page would be small.

alc added a comment.Jun 9 2019, 6:20 AM

Rather than overloading the valid field, I would prefer to explicitly have a union { vm_page_bits_t valid; vm_page_bits_t promoted; } so that the MI field can be changed without excessive churn if need be. wire_count is similarly overloaded and I had to work around that in D20486.

Okay. I'll add this to my TODO list.