Various fixes for TLB management on RISC-V.
ClosedPublic
Actions

Authored by jhb on Oct 4 2018, 5:43 PM.

Details

Reviewers

markj
br

Commits

rS339367: Various fixes for TLB management on RISC-V.

Summary

Remove the arm64-specific cpu_*cache* and cpu_tlb_flush* functions. Instead, add RISC-V specific inline functions in cpufunc.h for the fence.i and sfence.vma instructions.
Catch up to changes in the arm64 pmap and remove all the cpu_dcache_* calls, pmap_is_current, pmap_l3_valid_cacheable, and PTE_NEXT bits from pmap.
Remove references to the unimplemented riscv_setttb().
Remove unused cpu_nullop.
Add a link to the SBI doc to sbi.h.
Add support for a 4th argument in SBI calls. It's not documented but it seems implied for the asid argument to SBI_REMOVE_SFENCE_VMA_ASID.
Pass the arguments from sbi_remote_sfence*() to the SEE. BBL ignores them so this is just cosmetic.
Flush icaches on other CPUs when they resume from kdb in case the debugger wrote any breakpoints while the CPUs were paused in the IPI_STOP handler.
Add SMP vs UP versions of pmap_invalidate_* similar to amd64. The UP versions just use simple fences. The SMP versions use the sbi_remove_sfence*() functions to perform TLB shootdowns. Since we don't have a valid pm_active field in the riscv pmap, just IPI all CPUs for all invalidations for now.

Test Plan

booted riscv64 both UP and SMP in qemu

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

jhb created this revision.Oct 4 2018, 5:43 PM

Harbormaster completed remote builds in B19971: Diff 48736.Oct 4 2018, 5:43 PM

looks good

sys/riscv/include/sbi.h
64 ↗	(On Diff #48736)	I guess empty line can be removed before registers, and added after registers

markj added inline comments.Oct 8 2018, 4:49 PM

sys/cddl/dev/fbt/riscv/fbt_isa.c
81 ↗	(On Diff #48736)	Don't you want to sync the icache on all harts, not just locally? Why did you omit the plain fence()?
sys/riscv/riscv/pmap.c
819 ↗	(On Diff #48736)	"static" here too?
997 ↗	(On Diff #48736)	Existing callers of pmap_kremove() (which are all MD) handle this, except for the code I added in unmap_ucode(). I will fix that, but I don't think it makes sense to commit the comment.
2257 ↗	(On Diff #48736)	Remove the redundant parens while here?

jhb marked 10 inline comments as done.Oct 9 2018, 9:26 PM

jhb added inline comments.

sys/cddl/dev/fbt/riscv/fbt_isa.c
81 ↗	(On Diff #48736)	My reading is that you only need a fence() if you are going to do an IPI, that for a single-threaded case fence.i alone is sufficient. I stuck with fence_i() here simply because it matches the existing semantics and I wasn't sure if I could actually IPI here or not. I did check and none of the other architectures send an IPI here. The FENCE.I instruction is used to synchronize the instruction and data streams. RISC-V does not guarantee that stores to instruction memory will be made visible to instruction fetches on the same RISC-V hart until a FENCE.I instruction is executed. A FENCE.I instruction only ensures that a subsequent instruction fetch on a RISC-V hart will see any previous data stores already visible to the same RISC-V hart. FENCE.I does not ensure that other RISC-V harts’ instruction fetches will observe the local hart’s stores in a multiprocessor system. To make a store to instruction memory visible to all RISC-V harts, the writing hart has to execute a data FENCE before requesting that all remote RISC-V harts execute a FENCE.I.
sys/riscv/include/kdb.h
54 ↗	(On Diff #48736)	However, the comment I quoted means this just needs a fence_i(). The IPI to resume other harts will include a suitable fence() before the other harts execute fence_i() when resuming.
sys/riscv/riscv/db_interface.c
155 ↗	(On Diff #48736)	This fence() is also redundant.
sys/riscv/riscv/pmap.c
713 ↗	(On Diff #48736)	This was actually a question. I'd prefer to remove it if you both agree it is not needed.
819 ↗	(On Diff #48736)	Hmm, I was following amd64 which doesn't mark these static for some reason, but I should make them static, yes. Well, PMAP_INLINE expands to "extern inline" so I don't think static will work. amd64 exports pmap_invalidate_* in <machine/pmap.h> though this seems broken on SMP kernels? riscv doesn't have prototypes in <machine/pmap.h> so perhaps I should just use 'static __inline'?
997 ↗	(On Diff #48736)	The comment was more of a question. Should we have the fence here at all for kremove? It wasn't clear to me if pmap_kremove was an MI interface or not.
1022 ↗	(On Diff #48736)	This is also an open question in my mind. It's not clear to me if RISC-V actually needs the subr_devmap.c stuff or if that was just copied from arm64?
2257 ↗	(On Diff #48736)	What about the question I added above it? :) pmap_enter isn't modifying data that would be in the i-cache, it is only modifying PTEs, so presumably the sfence_vma() in pmap_invalidate_page() is all that is really needed (included on remote CPUs?)

Fix whitespace in sbi_call().
Drop fence() before single-threaded fence_i().
Trim parens.

Harbormaster completed remote builds in B20100: Diff 48952.Oct 9 2018, 9:31 PM

markj added inline comments.Oct 10 2018, 1:11 AM

sys/cddl/dev/fbt/riscv/fbt_isa.c
81 ↗	(On Diff #48736)	Ok, I think this is fine. The fact that other platforms don't synchronize the update with other cores is surprising and probably incorrect.
sys/riscv/riscv/pmap.c
2250 ↗	(On Diff #48952)	Hmm, is this needed if we're updating an entry and pa != opa (i.e., we're handling a CoW fault)?
713 ↗	(On Diff #48736)	Indeed, I can't see a purpose for it.
997 ↗	(On Diff #48736)	I would argue that it cannot usefully be an MI interface because of the lack of coherence guarantees. That said, I can't see a good reason for omitting the fence.
2257 ↗	(On Diff #48736)	Doesn't that depend on the i-cache organization? With a VIVT or VIPT i-cache, couldn't a different thread still be accessing cached instructions from the old mapping despite the sfence.vma? BTW, I think you want the i-cache flush to come before the store to the PTE. Otherwise the i-cache could end up containing a mix of contents from the old and new pages. In response to the comment I believe you can indeed restrict the check to (prot & VM_PROT_EXECUTE) != 0, and I do think you need to invalidate remote harts' i-caches here based on what I wrote above.

markj added inline comments.Oct 10 2018, 1:36 AM

sys/riscv/riscv/pmap.c
2257 ↗	(On Diff #48736)	Oops, s/or VIPT// in my comment above. The problem obviously doesn't apply if cache lookup requires virtual address translation.

jhb marked 9 inline comments as done.Oct 10 2018, 4:17 PM

jhb added inline comments.

sys/riscv/riscv/pmap.c
2250 ↗	(On Diff #48952)	I think so. Anytime you change a PTE you need to do a sfence.vma (my understanding is that it is equivalent to invlpg on x86).
2257 ↗	(On Diff #48736)	Hmm, the spec is not fully clear, but the description of fence.i only talks about synchronizing code and data streams when you are modifying code. It seems to be tailored for that case. The description of sfence.vma does say that it ensures that all implicit accesses of future instructions are ordered after stores from prior instructions: The supervisor memory-management fence instruction SFENCE.VMA is used to synchronize up- dates to in-memory memory-management data structures with current execution. Instruction exe- cution causes implicit reads and writes to these data structures; however, these implicit references are ordinarily not ordered with respect to loads and stores in the instruction stream. Executing an SFENCE.VMA instruction guarantees that any stores in the instruction stream prior to the SFENCE.VMA are ordered before all implicit references subsequent to the SFENCE.VMA. It reads to me as if sfence.vma is all you need after updating PTEs. I looked at some other pmap_enter() implementations. mips adds VM_PROT_EXECUTE to this check but doesn't do an IPI claiming it has to do with an unresolvable TLB miss. This might be due to MIPS using a soft MMU. arm64 has a variant of this that was added in r328510 but the condition is quite different. arm64 is careful to flush the i-cache before the invalidate. Both mips and arm64 also do it in pmap_enter_quick_locked() while riscv does not. powerpc does an icache sync after installing the new PTE. sparc64 does one after installing the new PTE.

markj added inline comments.Oct 10 2018, 4:31 PM

sys/riscv/riscv/pmap.c
2250 ↗	(On Diff #48952)	Right, but we've already done one in that case, line 2211.
2257 ↗	(On Diff #48736)	Ok. I guess I don't really understand why any platform needs an icache flush in pmap_enter(). Thinking of the case where proc_rwmem() is used to write a breakpoint: we simulate a copy-on-write fault and update the PTE for the text page being modified, and then we perform a global icache flush. What does the flush in pmap_enter() accomplish?

jhb marked 2 inline comments as done.Oct 10 2018, 4:41 PM

jhb added inline comments.

sys/riscv/riscv/pmap.c
2257 ↗	(On Diff #48736)	I did find this: https://lkml.org/lkml/2017/11/20/596 This tries to avoid an IPI, but it requires doing a fence.i during context switches instead. It also seems to do the fence.i before the PTE update. I think for now we should just use an IPI and do it before the sfence.vma. Eventually we should add pm_active to struct pmap to at least restrict the IPIs.

jhb marked an inline comment as done.Oct 10 2018, 5:07 PM

jhb added inline comments.

sys/riscv/riscv/pmap.c
2250 ↗	(On Diff #48952)	Hummmm. arm64 doesn't flush after "new" mappings only for changing mappings (which is a bit odd). I think risc-v wants a flush always. However, on line 2211 we have explicitly cleared the pte to 0 to mark it invalid before updating some data structures. I think the pmap_invalidate_page() is to push out that zeroed PTE whereas the one on this line is to push the new PTE from new_l3.

markj added inline comments.Oct 10 2018, 6:16 PM

sys/riscv/riscv/pmap.c
2250 ↗	(On Diff #48952)	We recently changed all pmap_enter() implementations to implement mapping updates using "break-before-make"; see r335784 for instance. So, we first mark the PTE invalid, issue a TLB shootdown and then write the new PTE value. If a different thread accesses the address while it is invalid, it'll fault and sleep until the page is unbusied by the pmap_enter() caller. I don't see the need for a second shootdown - the zeroed PTE that you mention is invalid. I don't see why you'd issue a shootdown after creating a new mapping in general. Do we have any platforms that do that?
2257 ↗	(On Diff #48736)	Ok. I agree, we can optimize later.

jhb marked an inline comment as done.Oct 11 2018, 5:28 PM

jhb added inline comments.

sys/riscv/riscv/pmap.c
2250 ↗	(On Diff #48952)	Well, pedanctically speaking the privilege spec seems to want an SFENCE after every PTE update. That said, you are probably right that we don't really need it. arm64 does one in the 'orig_l3 != 0' case when changing properties on a mapping, we could perhaps move this into that condition after the pmap_load_store()?

markj added inline comments.Oct 11 2018, 5:54 PM

sys/riscv/riscv/pmap.c
2250 ↗	(On Diff #48952)	mmm, I don't really see how: the spec says that sfence.vma flushes cache entries related to address translation, but the hardware shouldn't be caching translations of an invalid address. Flushing in the orig_l3 != 0 case makes sense, as that's the only case where the existing mapping is valid. I think your suggested approach is right.

Drop commented out fence at the end of sfence_vma.
Use static instead of PMAP_INLINE for pmap_invalidate_*().
Use the SBI interface for remote fence.i for pmap_sync_icache.
Flush the i-cache on all harts before writing a new PTE for an executable mapping.
Don't use SFENCE.VMA after changing a mapping from invalid to valid.