Page MenuHomeFreeBSD

Address the COW problem in i386's pmap_enter()
ClosedPublic

Authored by alc on Jul 5 2018, 5:54 AM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 20 2023, 1:06 AM
Unknown Object (File)
Nov 7 2023, 7:08 AM
Unknown Object (File)
Oct 19 2023, 11:51 PM
Unknown Object (File)
Oct 19 2023, 12:26 PM
Unknown Object (File)
Oct 6 2023, 6:03 AM
Unknown Object (File)
Oct 3 2023, 2:57 AM
Unknown Object (File)
Jun 3 2023, 1:51 AM
Unknown Object (File)
May 14 2023, 6:30 AM
Subscribers

Details

Summary

Invalidate the mapping in i386's pmap_enter() before updating its physical address.

Test Plan

pho@ is testing this version. Thanks!

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

i386/i386/pmap.c
3702 ↗(On Diff #44887)

In the case where pa == opa, we might clear PG_A without setting PGA_REFERENCED. amd64 has this issue as well (and had it before my change).

i386/i386/pmap.c
3687 ↗(On Diff #44887)

Perhaps assert that PG_V is set.

i386/i386/pmap.c
3687 ↗(On Diff #44887)

While I'm here, I'm thinking about porting the psind==1 patch from amd64. That requires some restructuring of this code to compute the new PTE before acquiring the pmap lock. If I also eliminate the inconsequential differences between this version of pmap_enter() and amd64's, then this assertion will be within an "if (origpte & PG_V)" block.

3702 ↗(On Diff #44887)

pmap_enter() always presets PG_A on the new PTE. The explanation being that, with the exception of wiring, the mappings created by pmap_enter() are always followed by an access.

Peter, could you please test this i386 patch?

Umm, upon further reflection, I do think that we should reorder the pmap_ts_referenced() call and the PGA_REFERENCED check throughout vm_pageout.c. The PGA_REFERENCED check needs to come second, not first, for my claim that the PMAP and PV list locking ensure that we always see the reference. vm_swapout.c already does the call first and the check second.

Should I elaborate?

(Peter, to be clear, none of this impacts my request to test this patch. Please do test it at your convenience.)

In D16133#342528, @alc wrote:

Umm, upon further reflection, I do think that we should reorder the pmap_ts_referenced() call and the PGA_REFERENCED check throughout vm_pageout.c. The PGA_REFERENCED check needs to come second, not first, for my claim that the PMAP and PV list locking ensure that we always see the reference. vm_swapout.c already does the call first and the check second.

Should I elaborate?

I think I touched upon this in D16005. However, as you pointed out, pmap_enter() always sets PG_A in the new entry, so I'm having trouble seeing a scenario where we would miss a reference.

In D16133#342528, @alc wrote:

Umm, upon further reflection, I do think that we should reorder the pmap_ts_referenced() call and the PGA_REFERENCED check throughout vm_pageout.c. The PGA_REFERENCED check needs to come second, not first, for my claim that the PMAP and PV list locking ensure that we always see the reference. vm_swapout.c already does the call first and the check second.

Should I elaborate?

I think I touched upon this in D16005. However, as you pointed out, pmap_enter() always sets PG_A in the new entry, so I'm having trouble seeing a scenario where we would miss a reference.

Yes, and that got me thinking again. :-) Historically, I worried about the dirty field, and didn't give nearly as much thought to PGA_REFERENCED.

Suppose that physical page P is mapped read only at virtual address VA; specifically, it's mapped copy-on-write, and this is the only mapping to P. Further, suppose PG_A is set within the mapping, but PGA_REFERENCED on P is clear.

Now, processor 1 calls pmap_enter() to map a new physical page NP at VA. Processor 1 acquires the pmap lock, performs the pte_load_clear() to destroy the mapping from VA to P, but hasn't set PGA_REFERENCED on P yet.

Processor 2 is running the page daemon. Processor 2 tests for PGA_REFERENCED on P, and doesn't find it set. Processor 2 calls pmap_ts_referenced() on P, and acquires the PV list lock. The PV entry mapping VA to P hasn't been destroyed yet, so processor 2 tries to get the pmap lock and fails. Processor 2 releases the PV list lock and blocks on acquiring the pmap lock.

Now, processor 1 will acquire the PV list lock, and destroy the PV entry mapping VA to P. Nothing else that pmap_enter(VA, NP) does matters to this discussion, so let's just say that pmap_enter(VA, NP) finishes, releasing the locks it held.

Processor 2 will now be allowed to acquire the pmap lock. However, it will then reacquire the PV list lock, and realize that the PV list has changed. So, it will release the pmap lock, and go back to the start. Now, the PV list for P is empty. So, pmap_ts_referenced() will return that there were no references found.

However, if you swap the order of the PGA_REFERENCED check and pmap_ts_referenced() call, this problem won't occur. In fact, the release of the PV list lock for P within pmap_enter() "synchronizes-with" the PV list acquire within pmap_ts_referenced() on P, so the set and later test for PGA_REFERENCED is properly synchronized on weaker memory model architectures.

i386/i386/pmap.c
3704–3706 ↗(On Diff #44887)

This is consistent with the original i386 code below. However, I note that the amd64 version does an initial (om->aflags & PGA_WRITEABLE) != 0 test to avoid unnecessary tests and atomics. I think that it makes sense to do the same here, especially since the page is unlikely to be PG_RW. I just want to double-check the management of the PGA_WRITEABLE flag first.

3776 ↗(On Diff #44887)

The original i386 code did (prot & VM_PROT_WRITE) == 0. (Look down and to the left.) However, you could hypothetically have a PG_RW mapping without PG_M preset. In which case, you should invalidate the TLB entry.

In D16133#342513, @alc wrote:

Peter, could you please test this i386 patch?

Absolutely.

alc edited the test plan for this revision. (Show Details)

Correct one of the KASSERTs. Reported by: pho

The physical page "om" is expected to be mapped read only, so check if the PGA_WRITEABLE flag is set before using an atomic to clear it.

Eliminate unnecessary uses of the variable "om".

This revision is now accepted and ready to land.Jul 6 2018, 4:27 PM
In D16133#342620, @alc wrote:

However, if you swap the order of the PGA_REFERENCED check and pmap_ts_referenced() call, this problem won't occur. In fact, the release of the PV list lock for P within pmap_enter() "synchronizes-with" the PV list acquire within pmap_ts_referenced() on P, so the set and later test for PGA_REFERENCED is properly synchronized on weaker memory model architectures.

I see now, thanks. I agree that reordering the PGA_REFERENCED check and the pmap_ts_referenced() call is sufficient to fix this problem.

I just looked at arm/pmap-v6.c, and it copied its PV entry management from i386. So, it doesn't have the PV entry allocation problem. Also, it breaks the old mapping before creating the new one, so it doesn't have the COW problem either.

I don't believe that arm/pmap-v4.c supports SMP, so there can't be a COW problem. (That said, it doesn't "break-before-make.") In regards to PV entry management, it recycles the old PV entry when the physical address being mapped changes, so it doesn't have the PV entry allocation problem.

Ran all of the stress2 tests with this patch on i386. No problems found.

This revision was automatically updated to reflect the committed changes.