Details

Reviewers

markj
jhb
manu

Commits

rS339423: Support RISC-V implementations that do not manage the A and D bits
rS339421: Support RISC-V implementations that do not manage the A and D bits

Summary

Support for 2nd scheme of managing A (accessed) and D (dirty) bits of PTE.
Schemes described in page 61 of riscv-privileged-v1.10.pdf document.

QEMU uses 1st scheme.
Spike has a compile-time macro RISCV_ENABLE_DIRTY to switch between schemes.
RocketCore and derivatives (e.g. lowRISC) support 2nd scheme only.

Test Plan

Run multiuser on Qemu, Spike (with and without RISCV_ENABLE_DIRTY).
Run multiuser on lowRISC.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

br created this revision.Oct 4 2018, 9:51 PM

Hmm, I think for this we want to actually wait to set PTE_A and PTE_D for user mappings until we get a fault? Presumably we'd want to check in the page fault handler if f the PTE was already valid before calling vm_fault and if it was set PTE_A (and PTE_D for a write) and then retry?

Setting PTE_A/PTE_D ahead of time for the kernel mappings to avoid the extra faults is certainly sensible (and the manual suggests it IIRC), but if we set them always for user mappings than the VM won't get the right feedback about which pages are accessed vs dirty.

Don't set A and D bits ahead of time of User mappings. Set them on fault

In D17424#371884, @jhb wrote:

Hmm, I think for this we want to actually wait to set PTE_A and PTE_D for user mappings until we get a fault? Presumably we'd want to check in the page fault handler if f the PTE was already valid before calling vm_fault and if it was set PTE_A (and PTE_D for a write) and then retry?

Setting PTE_A/PTE_D ahead of time for the kernel mappings to avoid the extra faults is certainly sensible (and the manual suggests it IIRC), but if we set them always for user mappings than the VM won't get the right feedback about which pages are accessed vs dirty.

Agree! I modified pmap_enter()

restore pte.h

markj added inline comments.Oct 9 2018, 2:51 PM

sys/riscv/riscv/pmap.c
490	The parens are not needed. Same comment applies in several places below.
2085	We want to set PTE_D only if (flags & VM_PROT_WRITE) != 0. Otherwise this page will be marked dirty unnecessarily.
2451	Don't we want PTE_R set too? Certainly this should be a leaf entry.

Fix issues pointed by markj

jhb added inline comments.Oct 9 2018, 8:42 PM

sys/riscv/riscv/pmap.c
2085	Note that this is for a kernel-only mapping, so not quite sure if PTE_D matters? OTOH, we can map swap-backed objects into KVA (e.g. shm_map in sys/kern/uipc_shm.c though there are currently no in-tree consumers, but also the nvidia driver maps VM objects shared with user space into KVA), so we probably don't want to set PTE_D even on kernel mappings of regular VM objects until they are actually written to.
2196	Is pmap_enter() really the right place to be setting these? In the Alpha pmap (which also had software-managed A/D bits), the page fault handler would first check to see if the fault had occurred on a page that needed to set A or D and if it set the equivalent software bit and didn't call vm_fault at all. I think you really want to use similar logic to that in the page fault exception handler, not in pmap_enter. That is, when you get a load or instruction page fault, first look for the existing PTE. If it is valid and has the right permission enabled but doesn't have A set, set A and then retry without calling vm_fault at all. Similarly, for a store page fault, find the the existing PTE and check if either A or D are missing. If they are, set both and then return to retry without calling vm_fault.

markj added inline comments.Oct 9 2018, 11:05 PM

sys/riscv/riscv/pmap.c
2085	Oops, I missed that. I thought we were eagerly setting PTE_D on user mappings here. There are some submaps (exec_map, pipe_map) where this is applicable, but in those cases we are pretty much always going to dirty any accessed page. Anyway, I think it's still right to avoid setting PTE_D unnecessarily.
2196	FWIW, armv4 does this as well. It seems rather unfortunate that we can't avoid this lookup on riscv when the implementation handles setting PTE_A and PTE_D.

jhb added inline comments.Oct 9 2018, 11:23 PM

sys/riscv/riscv/pmap.c
2085	The manual actually says to only set A and D aggressively for memio and regions where the OS knows for certain it never needs the bits: The A and D bits are never cleared by the implementation. If the supervisor software does not rely on accessed and/or dirty bits, e.g. if it does not swap memory pages to secondary storage or if the pages are being used to map I/O space, it should always set them to 1 in the PTE to improve performance. (albeit this is in one of the italics sections that are just extra prose, not actual "spec") I kind of think we should probably not be doing eager setting of A/D even for kernel mappings in pmap_enter() but perhaps only for things like the direct map or pmap_mapdev().
2196	Yeah. I don't think there is any way to "know" which implementation is in place (e.g. no equivalent of a x86 cpuid bit). If we found this mattered we could perhaps "notice" implicit A/D settings and then set a global to disable the PTE fetch once we notice the first one. We could also perhaps temporarily map a page at a temporary KVA during boot and touch it and see if we got a fault or not or if A was implicitly set and use that to set the global.

Don't set PTE_D bit even on kernel mappings of regular VM objects ahead of time
Don't call to vm_fault on PTE_A, PTE_D (2nd scheme) faults. Use pmap_fault_fixup() instead

Don't set PTE_A bit on a regular kernel mapping too

Check if PTE is valid before setting A or D bits

I think I would perhaps reword the commit message as "2nd Scheme" isn't very descriptive. Maybe something like:

Support processors that do not manage the A and D bits.

RISC-V page table entries support A (accessed) and D (dirty) bits.  The spec
makes hardware support for these bits optional.  Processors that do not
manage these bits in hardware raise page faults for accesses to a valid page
without A set and writes to a writable page without D set.  Check for these
types of faults when handling a page fault and fixup the PTE without calling
vm_fault if they occur.

followed by the paragraph noting which implementations provide hardware support for A and D vs software.

sys/riscv/riscv/pmap.c
2027	You can trim the parens around '==' inside of '&&', for example: if (((orig_l3 & (PTE_D \| PTE_W)) == PTE_W && (prot & PROT_WRITE != 0) Also, the trailing \ is kind of odd. It seems like you maybe don't have parentheses around the entire expression currently?
2031	Writes also need to set PTE_A, not just reads. I would just always set PTE_A in new_l3 without any condition.
2084	We don't have to do this now, but I looked at amd64's pmap_enter() and it seems to assume that pmap_enter() is always in response to a fault so it sets PG_A (equivalent of PTE_A) always and it also sets PTE_M (equivalent of PTE_D) if flags (fault type) includes PROT_WRITE (N.B. not based on prot). amd64 does this here at the top of the function when constructing 'newpte' (same thing as 'new_l3' here). For RISC-V processors that don't do hardware A/D bits that would seem to be an optimization to avoid an instant fault when retrying the instruction.
2451	amd64's pmap also sets PG_NX here conditionally, so PTE_R \| PTE_X is probably wanted?

Remove unneeded parenthesis in pmap_fault_fixup()
Set PTE_X for a new entry in pmap_enter_quick_locked()

br marked an inline comment as done.Oct 17 2018, 12:23 PM

br added inline comments.

sys/riscv/riscv/pmap.c
2031	We don't want to update l3 and invalite TLB entry without a reason I think

jhb added inline comments.Oct 17 2018, 5:06 PM

sys/riscv/riscv/pmap.c
2031	You've taken a fault, so if the PTE is valid the page is always going to be accessed and the 'orig != new' check means you only invalidate the TLB entry for this page if you are actually setting PTE_A. That said, this is probably fine as you would not want to set PTE_A for a write fault on a read-only mapping. The other way to handle that perhaps would be to verify permissions first and fail early just as you do for checking PTE_V, then the logic to set bits can be simpler: if ((orig_l3 & PTE_V) == 0 \|\| ((prot & VM_PROT_WRITE) != 0 && (orig_l3 & PTE_W) == 0) \|\| ((prot & VM_PROT_READ) != 0 && (orig_l3 & PTE_R) == 0)) return (0); new_l3 = orig_l3 \| PTE_A; if ((prot & VM_PROT_WRITE) != 0) new_l3 \|= PTE_D; if (orig_l3 != new_l3) { pmap_load_store(l3, new_l3); pmap_invalidate_page(mmap, va); return (1); } /* XXX: This case should never happen since it means the PTE shouldn't have resulted in a fault .*/ return (0); }

My last note is a suggestion, it's also fine as-is. I will probably work on changing the pmap to honor VM_PROT_EXECUTE next which will alter this a bit anyway and might use that approach then.

This revision is now accepted and ready to land.Oct 17 2018, 5:08 PM

Closed by commit rS339421: Support RISC-V implementations that do not manage the A and D bits (authored by br). · Explain WhyOct 18 2018, 3:08 PM

This revision was automatically updated to reflect the committed changes.

Herald added a reviewer: manu. · View Herald TranscriptOct 18 2018, 3:08 PM

Herald added subscribers: andrew, imp. · View Herald Transcript

Support for 2nd scheme of PTE A and D bits management
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 49191

sys/riscv/include/pmap.h

sys/riscv/include/pte.h

sys/riscv/riscv/locore.S

sys/riscv/riscv/pmap.c

sys/riscv/riscv/trap.c

Support for 2nd scheme of PTE A and D bits managementClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 49191

sys/riscv/include/pmap.h

sys/riscv/include/pte.h

sys/riscv/riscv/locore.S

sys/riscv/riscv/pmap.c

sys/riscv/riscv/trap.c

Support for 2nd scheme of PTE A and D bits management
ClosedPublic
Actions

Revision Contents
Changeset List