Page MenuHomeFreeBSD

Try translating the fault address with pmap unlocked.
ClosedPublic

Authored by alc on Sep 17 2019, 10:47 AM.
Tags
None
Referenced Files
F81633760: D21685.id.diff
Fri, Apr 19, 7:02 AM
F81585753: D21685.id62350.diff
Thu, Apr 18, 12:58 PM
Unknown Object (File)
Fri, Mar 29, 2:46 PM
Unknown Object (File)
Fri, Mar 29, 5:26 AM
Unknown Object (File)
Jan 4 2024, 12:22 AM
Unknown Object (File)
Dec 20 2023, 12:44 AM
Unknown Object (File)
Nov 15 2023, 1:08 AM
Unknown Object (File)
Nov 7 2023, 8:33 PM

Details

Summary

When the kernel performs a superpage promotion or demotion it needs to
leave the page tables in a state such that another CPU may fault. This
is handled by locking the pmap and checking if the translation still
faults, however this can lead to a panic when the thread accessing one
of these pages is unable to sleep.

Test Plan

Lightly tested on an Amazon EC2 a1.2xlarge (8-core) virtual machine.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 26529
Build 24926: arc lint + arc unit

Event Timeline

I don't see why this is sufficient. If the unlocked translation attempt fails, e.g., because the mapping is being promoted or demoted a second time, we may still attempt to acquire a mutex in a scenario where that is not allowed.

Can we use the same trick that pmap_kextract() does to identify mappings that are transiently invalid?

sys/arm64/arm64/pmap.c
5846

Typo: "operation".

5856

Typo, "again".

I don't see why this is sufficient. If the unlocked translation attempt fails, e.g., because the mapping is being promoted or demoted a second time, we may still attempt to acquire a mutex in a scenario where that is not allowed.

Can we use the same trick that pmap_kextract() does to identify mappings that are transiently invalid?

Yes, I believe so. A "translation fault" only says that the page table walk encountered an invalid entry. It does not say anything about protection, and so the lock-free software page table walk that I wrote for pmap_kextract() should suffice.

Here is a patch based on Mark's suggestion.

Index: arm64/arm64/pmap.c
===================================================================
--- arm64/arm64/pmap.c  (revision 352346)
+++ arm64/arm64/pmap.c  (working copy)
@@ -5810,13 +5810,21 @@ pmap_fault(pmap_t pmap, uint64_t esr, uint64_t far
        case ISS_DATA_DFSC_TF_L1:
        case ISS_DATA_DFSC_TF_L2:
        case ISS_DATA_DFSC_TF_L3:
+               if (pmap == kernel_pmap) {
+                       /*
+                        * The translation fault may have occurred within a
+                        * critical section.  Therefore, we must determine if
+                        * the address was invalid due to a break-before-make
+                        * sequence without acquiring the pmap lock.
+                        */
+                       if (pmap_kextract(far) != 0)
+                               rv = KERN_SUCCESS;
+                       break;
+               }
                PMAP_LOCK(pmap);
                /* Ask the MMU to check the address */
                intr = intr_disable();
-               if (pmap == kernel_pmap)
-                       par = arm64_address_translate_s1e1r(far);
-               else
-                       par = arm64_address_translate_s1e0r(far);
+               par = arm64_address_translate_s1e0r(far);
                intr_restore(intr);
                PMAP_UNLOCK(pmap);
 

@alc Can you commender this and update with that patch?

alc edited reviewers, added: andrew; removed: alc.
alc edited the summary of this revision. (Show Details)
alc edited the test plan for this revision. (Show Details)

On a translation fault within the kernel address space, perform a lock-free page table walk that handles transient invalidations due to superpage promotion or demotion.

This revision is now accepted and ready to land.Sep 20 2019, 3:25 PM

As an aside, I want to ask why pmap_update_entry() both blocks interrupts and performs critical_enter()? Doesn't the former suffice?

arm64/arm64/pmap.c
5824–5829 ↗(On Diff #62350)

I think this should be in the else case.

Style: Replace "break" by "else".

This revision now requires review to proceed.Sep 21 2019, 6:29 AM
This revision is now accepted and ready to land.Sep 21 2019, 10:11 AM
alc marked an inline comment as done.Sep 21 2019, 3:40 PM
In D21685#474145, @alc wrote:

As an aside, I want to ask why pmap_update_entry() both blocks interrupts and performs critical_enter()? Doesn't the former suffice?

I believe the critical_enter()/exit() is superfluous.

An often forgotten fact is that there are parts of the kernel address space that are swappable, the exec and pipe submaps. If a page from one of these areas gets written out to swap space, it will be write protected. If it is then written to before it is reclaimed by the page daemon, there will be a call to pmap_enter() that essentially does a break-before-make sequence, but without blocking interrupts or using pmap_update_entry()'s "trick" for marking the mapping invalidation as transient.

In such a case, a concurrent access that occurs while the mapping is invalid will no longer be recognized by pmap_fault() as a transient fault, so we will wind up calling vm_fault() and pmap_enter() again. In other words, pmap_kextract() won't recognize the mapping change as transient, whereas before acquiring the kernel pmap's lock would have had pmap_fault() wait until the change had completed before rechecking the address. I don't think that this repeated call to vm_fault() and pmap_enter() is going to do any harm.