Invalidate the mapping in i386's pmap_enter() before updating its physical address.
Details
pho@ is testing this version. Thanks!
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
i386/i386/pmap.c | ||
---|---|---|
3702 ↗ | (On Diff #44887) | In the case where pa == opa, we might clear PG_A without setting PGA_REFERENCED. amd64 has this issue as well (and had it before my change). |
i386/i386/pmap.c | ||
---|---|---|
3687 ↗ | (On Diff #44887) | Perhaps assert that PG_V is set. |
i386/i386/pmap.c | ||
---|---|---|
3687 ↗ | (On Diff #44887) | While I'm here, I'm thinking about porting the psind==1 patch from amd64. That requires some restructuring of this code to compute the new PTE before acquiring the pmap lock. If I also eliminate the inconsequential differences between this version of pmap_enter() and amd64's, then this assertion will be within an "if (origpte & PG_V)" block. |
3702 ↗ | (On Diff #44887) | pmap_enter() always presets PG_A on the new PTE. The explanation being that, with the exception of wiring, the mappings created by pmap_enter() are always followed by an access. |
Umm, upon further reflection, I do think that we should reorder the pmap_ts_referenced() call and the PGA_REFERENCED check throughout vm_pageout.c. The PGA_REFERENCED check needs to come second, not first, for my claim that the PMAP and PV list locking ensure that we always see the reference. vm_swapout.c already does the call first and the check second.
Should I elaborate?
(Peter, to be clear, none of this impacts my request to test this patch. Please do test it at your convenience.)
I think I touched upon this in D16005. However, as you pointed out, pmap_enter() always sets PG_A in the new entry, so I'm having trouble seeing a scenario where we would miss a reference.
Yes, and that got me thinking again. :-) Historically, I worried about the dirty field, and didn't give nearly as much thought to PGA_REFERENCED.
Suppose that physical page P is mapped read only at virtual address VA; specifically, it's mapped copy-on-write, and this is the only mapping to P. Further, suppose PG_A is set within the mapping, but PGA_REFERENCED on P is clear.
Now, processor 1 calls pmap_enter() to map a new physical page NP at VA. Processor 1 acquires the pmap lock, performs the pte_load_clear() to destroy the mapping from VA to P, but hasn't set PGA_REFERENCED on P yet.
Processor 2 is running the page daemon. Processor 2 tests for PGA_REFERENCED on P, and doesn't find it set. Processor 2 calls pmap_ts_referenced() on P, and acquires the PV list lock. The PV entry mapping VA to P hasn't been destroyed yet, so processor 2 tries to get the pmap lock and fails. Processor 2 releases the PV list lock and blocks on acquiring the pmap lock.
Now, processor 1 will acquire the PV list lock, and destroy the PV entry mapping VA to P. Nothing else that pmap_enter(VA, NP) does matters to this discussion, so let's just say that pmap_enter(VA, NP) finishes, releasing the locks it held.
Processor 2 will now be allowed to acquire the pmap lock. However, it will then reacquire the PV list lock, and realize that the PV list has changed. So, it will release the pmap lock, and go back to the start. Now, the PV list for P is empty. So, pmap_ts_referenced() will return that there were no references found.
However, if you swap the order of the PGA_REFERENCED check and pmap_ts_referenced() call, this problem won't occur. In fact, the release of the PV list lock for P within pmap_enter() "synchronizes-with" the PV list acquire within pmap_ts_referenced() on P, so the set and later test for PGA_REFERENCED is properly synchronized on weaker memory model architectures.
i386/i386/pmap.c | ||
---|---|---|
3704–3706 ↗ | (On Diff #44887) | This is consistent with the original i386 code below. However, I note that the amd64 version does an initial (om->aflags & PGA_WRITEABLE) != 0 test to avoid unnecessary tests and atomics. I think that it makes sense to do the same here, especially since the page is unlikely to be PG_RW. I just want to double-check the management of the PGA_WRITEABLE flag first. |
3776 ↗ | (On Diff #44887) | The original i386 code did (prot & VM_PROT_WRITE) == 0. (Look down and to the left.) However, you could hypothetically have a PG_RW mapping without PG_M preset. In which case, you should invalidate the TLB entry. |
Correct one of the KASSERTs. Reported by: pho
The physical page "om" is expected to be mapped read only, so check if the PGA_WRITEABLE flag is set before using an atomic to clear it.
Eliminate unnecessary uses of the variable "om".
I see now, thanks. I agree that reordering the PGA_REFERENCED check and the pmap_ts_referenced() call is sufficient to fix this problem.
I just looked at arm/pmap-v6.c, and it copied its PV entry management from i386. So, it doesn't have the PV entry allocation problem. Also, it breaks the old mapping before creating the new one, so it doesn't have the COW problem either.
I don't believe that arm/pmap-v4.c supports SMP, so there can't be a COW problem. (That said, it doesn't "break-before-make.") In regards to PV entry management, it recycles the old PV entry when the physical address being mapped changes, so it doesn't have the PV entry allocation problem.