Page MenuHomeFreeBSD

Support for 2nd scheme of PTE A and D bits management
ClosedPublic

Authored by br on Oct 4 2018, 9:51 PM.
Tags
None
Referenced Files
F108160530: D17424.id48755.diff
Wed, Jan 22, 1:07 AM
Unknown Object (File)
Fri, Jan 17, 6:05 PM
Unknown Object (File)
Fri, Jan 10, 1:18 PM
Unknown Object (File)
Wed, Jan 8, 11:33 PM
Unknown Object (File)
Mon, Jan 6, 9:14 AM
Unknown Object (File)
Fri, Jan 3, 7:11 PM
Unknown Object (File)
Fri, Jan 3, 6:41 PM
Unknown Object (File)
Fri, Jan 3, 6:03 PM
Subscribers

Details

Summary

Support for 2nd scheme of managing A (accessed) and D (dirty) bits of PTE.
Schemes described in page 61 of riscv-privileged-v1.10.pdf document.

QEMU uses 1st scheme.
Spike has a compile-time macro RISCV_ENABLE_DIRTY to switch between schemes.
RocketCore and derivatives (e.g. lowRISC) support 2nd scheme only.

Test Plan

Run multiuser on Qemu, Spike (with and without RISCV_ENABLE_DIRTY).
Run multiuser on lowRISC.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Hmm, I think for this we want to actually wait to set PTE_A and PTE_D for user mappings until we get a fault? Presumably we'd want to check in the page fault handler if f the PTE was already valid before calling vm_fault and if it was set PTE_A (and PTE_D for a write) and then retry?

Setting PTE_A/PTE_D ahead of time for the kernel mappings to avoid the extra faults is certainly sensible (and the manual suggests it IIRC), but if we set them always for user mappings than the VM won't get the right feedback about which pages are accessed vs dirty.

Don't set A and D bits ahead of time of User mappings. Set them on fault

In D17424#371884, @jhb wrote:

Hmm, I think for this we want to actually wait to set PTE_A and PTE_D for user mappings until we get a fault? Presumably we'd want to check in the page fault handler if f the PTE was already valid before calling vm_fault and if it was set PTE_A (and PTE_D for a write) and then retry?

Setting PTE_A/PTE_D ahead of time for the kernel mappings to avoid the extra faults is certainly sensible (and the manual suggests it IIRC), but if we set them always for user mappings than the VM won't get the right feedback about which pages are accessed vs dirty.

Agree! I modified pmap_enter()

sys/riscv/riscv/pmap.c
508 ↗(On Diff #48927)

The parens are not needed. Same comment applies in several places below.

2048 ↗(On Diff #48927)

We want to set PTE_D only if (flags & VM_PROT_WRITE) != 0. Otherwise this page will be marked dirty unnecessarily.

2425 ↗(On Diff #48927)

Don't we want PTE_R set too? Certainly this should be a leaf entry.

Fix issues pointed by markj

sys/riscv/riscv/pmap.c
2167 ↗(On Diff #48935)

Is pmap_enter() really the right place to be setting these? In the Alpha pmap (which also had software-managed A/D bits), the page fault handler would first check to see if the fault had occurred on a page that needed to set A or D and if it set the equivalent software bit and didn't call vm_fault at all. I think you really want to use similar logic to that in the page fault exception handler, not in pmap_enter. That is, when you get a load or instruction page fault, first look for the existing PTE. If it is valid and has the right permission enabled but doesn't have A set, set A and then retry without calling vm_fault at all. Similarly, for a store page fault, find the the existing PTE and check if either A or D are missing. If they are, set both and then return to retry without calling vm_fault.

2048 ↗(On Diff #48927)

Note that this is for a kernel-only mapping, so not quite sure if PTE_D matters? OTOH, we can map swap-backed objects into KVA (e.g. shm_map in sys/kern/uipc_shm.c though there are currently no in-tree consumers, but also the nvidia driver maps VM objects shared with user space into KVA), so we probably don't want to set PTE_D even on kernel mappings of regular VM objects until they are actually written to.

sys/riscv/riscv/pmap.c
2167 ↗(On Diff #48935)

FWIW, armv4 does this as well. It seems rather unfortunate that we can't avoid this lookup on riscv when the implementation handles setting PTE_A and PTE_D.

2048 ↗(On Diff #48927)

Oops, I missed that. I thought we were eagerly setting PTE_D on user mappings here. There are some submaps (exec_map, pipe_map) where this is applicable, but in those cases we are pretty much always going to dirty any accessed page. Anyway, I think it's still right to avoid setting PTE_D unnecessarily.

sys/riscv/riscv/pmap.c
2167 ↗(On Diff #48935)

Yeah. I don't think there is any way to "know" which implementation is in place (e.g. no equivalent of a x86 cpuid bit). If we found this mattered we could perhaps "notice" implicit A/D settings and then set a global to disable the PTE fetch once we notice the first one. We could also perhaps temporarily map a page at a temporary KVA during boot and touch it and see if we got a fault or not or if A was implicitly set and use that to set the global.

2048 ↗(On Diff #48927)

The manual actually says to only set A and D aggressively for memio and regions where the OS knows for certain it never needs the bits:

The A and D bits are never cleared by the implementation. If the supervisor software does not rely on accessed and/or dirty bits, e.g. if it does not swap memory pages to secondary storage or if the pages are being used to map I/O space, it should always set them to 1 in the PTE to improve performance.

(albeit this is in one of the italics sections that are just extra prose, not actual "spec")

I kind of think we should probably not be doing eager setting of A/D even for kernel mappings in pmap_enter() but perhaps only for things like the direct map or pmap_mapdev().

  • Don't set PTE_D bit even on kernel mappings of regular VM objects ahead of time
  • Don't call to vm_fault on PTE_A, PTE_D (2nd scheme) faults. Use pmap_fault_fixup() instead
br edited the summary of this revision. (Show Details)

Don't set PTE_A bit on a regular kernel mapping too

Check if PTE is valid before setting A or D bits

I think I would perhaps reword the commit message as "2nd Scheme" isn't very descriptive. Maybe something like:

Support processors that do not manage the A and D bits.

RISC-V page table entries support A (accessed) and D (dirty) bits.  The spec
makes hardware support for these bits optional.  Processors that do not
manage these bits in hardware raise page faults for accesses to a valid page
without A set and writes to a writable page without D set.  Check for these
types of faults when handling a page fault and fixup the PTE without calling
vm_fault if they occur.

followed by the paragraph noting which implementations provide hardware support for A and D vs software.

sys/riscv/riscv/pmap.c
2027 ↗(On Diff #49191)

You can trim the parens around '==' inside of '&&', for example:

if (((orig_l3 & (PTE_D | PTE_W)) == PTE_W &&
     (prot & PROT_WRITE != 0)

Also, the trailing \ is kind of odd. It seems like you maybe don't have parentheses around the entire expression currently?

2031 ↗(On Diff #49191)

Writes also need to set PTE_A, not just reads. I would just always set PTE_A in new_l3 without any condition.

2084 ↗(On Diff #49191)

We don't have to do this now, but I looked at amd64's pmap_enter() and it seems to assume that pmap_enter() is always in response to a fault so it sets PG_A (equivalent of PTE_A) always and it also sets PTE_M (equivalent of PTE_D) if flags (fault type) includes PROT_WRITE (N.B. not based on prot). amd64 does this here at the top of the function when constructing 'newpte' (same thing as 'new_l3' here).

For RISC-V processors that don't do hardware A/D bits that would seem to be an optimization to avoid an instant fault when retrying the instruction.

2425 ↗(On Diff #48927)

amd64's pmap also sets PG_NX here conditionally, so PTE_R | PTE_X is probably wanted?

  • Remove unneeded parenthesis in pmap_fault_fixup()
  • Set PTE_X for a new entry in pmap_enter_quick_locked()
br marked an inline comment as done.Oct 17 2018, 12:23 PM
br added inline comments.
sys/riscv/riscv/pmap.c
2031 ↗(On Diff #49191)

We don't want to update l3 and invalite TLB entry without a reason I think

sys/riscv/riscv/pmap.c
2031 ↗(On Diff #49191)

You've taken a fault, so if the PTE is valid the page is always going to be accessed and the 'orig != new' check means you only invalidate the TLB entry for this page if you are actually setting PTE_A. That said, this is probably fine as you would not want to set PTE_A for a write fault on a read-only mapping. The other way to handle that perhaps would be to verify permissions first and fail early just as you do for checking PTE_V, then the logic to set bits can be simpler:

    if ((orig_l3 & PTE_V) == 0 ||
        ((prot & VM_PROT_WRITE) != 0 && (orig_l3 & PTE_W) == 0) ||
        ((prot & VM_PROT_READ) != 0 && (orig_l3 & PTE_R) == 0))
            return (0);
    
    new_l3 = orig_l3 | PTE_A;
    if ((prot & VM_PROT_WRITE) != 0)
        new_l3 |= PTE_D;

    if (orig_l3 != new_l3) {
        pmap_load_store(l3, new_l3);
        pmap_invalidate_page(mmap, va);
        return (1);
    }

    /* XXX: This case should never happen since it means the PTE shouldn't have resulted in a fault .*/
    return (0);
}

My last note is a suggestion, it's also fine as-is. I will probably work on changing the pmap to honor VM_PROT_EXECUTE next which will alter this a bit anyway and might use that approach then.

This revision is now accepted and ready to land.Oct 17 2018, 5:08 PM
This revision was automatically updated to reflect the committed changes.