Page MenuHomeFreeBSD

Try translating the fault address with pmap unlocked.
ClosedPublic

Authored by alc on Sep 17 2019, 10:47 AM.

Details

Summary

When the kernel performs a superpage promotion or demotion it needs to
leave the page tables in a state such that another CPU may fault. This
is handled by locking the pmap and checking if the translation still
faults, however this can lead to a panic when the thread accessing one
of these pages is unable to sleep.

Test Plan

Lightly tested on an Amazon EC2 a1.2xlarge (8-core) virtual machine.

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 26529
Build 24926: arc lint + arc unit

Event Timeline

andrew created this revision.Sep 17 2019, 10:47 AM
markj added a comment.Sep 17 2019, 2:33 PM

I don't see why this is sufficient. If the unlocked translation attempt fails, e.g., because the mapping is being promoted or demoted a second time, we may still attempt to acquire a mutex in a scenario where that is not allowed.

Can we use the same trick that pmap_kextract() does to identify mappings that are transiently invalid?

sys/arm64/arm64/pmap.c
5846

Typo: "operation".

5856

Typo, "again".

markj added a reviewer: alc.Sep 17 2019, 2:33 PM
alc added a comment.Sep 18 2019, 3:18 PM

I don't see why this is sufficient. If the unlocked translation attempt fails, e.g., because the mapping is being promoted or demoted a second time, we may still attempt to acquire a mutex in a scenario where that is not allowed.
Can we use the same trick that pmap_kextract() does to identify mappings that are transiently invalid?

Yes, I believe so. A "translation fault" only says that the page table walk encountered an invalid entry. It does not say anything about protection, and so the lock-free software page table walk that I wrote for pmap_kextract() should suffice.

alc added a comment.Sep 19 2019, 6:00 PM

Here is a patch based on Mark's suggestion.

Index: arm64/arm64/pmap.c
===================================================================
--- arm64/arm64/pmap.c  (revision 352346)
+++ arm64/arm64/pmap.c  (working copy)
@@ -5810,13 +5810,21 @@ pmap_fault(pmap_t pmap, uint64_t esr, uint64_t far
        case ISS_DATA_DFSC_TF_L1:
        case ISS_DATA_DFSC_TF_L2:
        case ISS_DATA_DFSC_TF_L3:
+               if (pmap == kernel_pmap) {
+                       /*
+                        * The translation fault may have occurred within a
+                        * critical section.  Therefore, we must determine if
+                        * the address was invalid due to a break-before-make
+                        * sequence without acquiring the pmap lock.
+                        */
+                       if (pmap_kextract(far) != 0)
+                               rv = KERN_SUCCESS;
+                       break;
+               }
                PMAP_LOCK(pmap);
                /* Ask the MMU to check the address */
                intr = intr_disable();
-               if (pmap == kernel_pmap)
-                       par = arm64_address_translate_s1e1r(far);
-               else
-                       par = arm64_address_translate_s1e0r(far);
+               par = arm64_address_translate_s1e0r(far);
                intr_restore(intr);
                PMAP_UNLOCK(pmap);

@alc Can you commender this and update with that patch?

alc commandeered this revision.Sep 20 2019, 2:59 PM
alc edited reviewers, added: andrew; removed: alc.
alc updated this revision to Diff 62350.Sep 20 2019, 3:22 PM
alc edited the summary of this revision. (Show Details)
alc edited the test plan for this revision. (Show Details)

On a translation fault within the kernel address space, perform a lock-free page table walk that handles transient invalidations due to superpage promotion or demotion.

markj accepted this revision.Sep 20 2019, 3:25 PM
This revision is now accepted and ready to land.Sep 20 2019, 3:25 PM
alc added a comment.Sep 20 2019, 5:44 PM

As an aside, I want to ask why pmap_update_entry() both blocks interrupts and performs critical_enter()? Doesn't the former suffice?

andrew added inline comments.Sep 20 2019, 6:52 PM
arm64/arm64/pmap.c
5824–5829 ↗(On Diff #62350)

I think this should be in the else case.

alc updated this revision to Diff 62373.Sep 21 2019, 6:29 AM

Style: Replace "break" by "else".

This revision now requires review to proceed.Sep 21 2019, 6:29 AM
andrew accepted this revision.Sep 21 2019, 10:11 AM
This revision is now accepted and ready to land.Sep 21 2019, 10:11 AM
alc marked an inline comment as done.Sep 21 2019, 3:40 PM
markj accepted this revision.Sep 21 2019, 4:35 PM
In D21685#474145, @alc wrote:

As an aside, I want to ask why pmap_update_entry() both blocks interrupts and performs critical_enter()? Doesn't the former suffice?

I believe the critical_enter()/exit() is superfluous.

alc added a comment.Sep 21 2019, 5:08 PM

An often forgotten fact is that there are parts of the kernel address space that are swappable, the exec and pipe submaps. If a page from one of these areas gets written out to swap space, it will be write protected. If it is then written to before it is reclaimed by the page daemon, there will be a call to pmap_enter() that essentially does a break-before-make sequence, but without blocking interrupts or using pmap_update_entry()'s "trick" for marking the mapping invalidation as transient.

In such a case, a concurrent access that occurs while the mapping is invalid will no longer be recognized by pmap_fault() as a transient fault, so we will wind up calling vm_fault() and pmap_enter() again. In other words, pmap_kextract() won't recognize the mapping change as transient, whereas before acquiring the kernel pmap's lock would have had pmap_fault() wait until the change had completed before rechecking the address. I don't think that this repeated call to vm_fault() and pmap_enter() is going to do any harm.

This revision was automatically updated to reflect the committed changes.