Differential D26083 Diff 75888 sys/arm64/arm64/locore.S

Changeset View

Standalone View

sys/arm64/arm64/locore.S

Show First 20 Lines • Show All 244 Lines • ▼ Show 20 Lines
*/		*/
drop_to_el1:		drop_to_el1:
mrs x23, CurrentEL		mrs x23, CurrentEL
lsr x23, x23, #2		lsr x23, x23, #2
cmp x23, #0x2		cmp x23, #0x2
b.eq 1f		b.eq 1f
ret		ret
1:		1:
/* Configure the Hypervisor */		/*
mov x2, #(HCR_RW)		* If the MMU is active, then it is using a page table where VA == PA.
		* But the page table won't have entries for the hypervisor EL2
		* initialization code which is loaded into memory with the vmm module.
		*
		* So we disable the MMU in EL2 to make the vmm hypervisor code run
		* successfully.
		*/
		dsb sy
		alexandru.elisei_arm.comUnsubmitted Not Done Inline Actions I'm not sure about the purpose of this instruction. I remember copying it from where we disable the MMU from EL1 (after drop_to_el1 returns), but now I can't figure out why this is used. If the MMU is off, it doesn't do anything because data accesses are to Device-nGnRnE memory according to ARM DDI 0487F.b, page D5-2586 (the PE reads and writes directly to main memory). If the MMU is on, I believe this still has no effect because: Before this point, the kernel hasn't performed any explicit memory accesses, so why do a DSB? I'm speculating, but if this is done as a cheap dcache clean, The PE can speculate data reads and instruction fetches (both are cached in clean cache lines) after the instruction terminates and before we disable the MMU. I'm not 100% sure of this, but the definition of DSB from ARM DDI 0487F.b, page B2-138, only talks about data accesses that come in program order before the barrier, it doesn't say anything about reordering the instruction itself. I think the DSB can be reordered at any point in the program as long as it allowed by the barrier definition. alexandru.elisei_arm.com: I'm not sure about the purpose of this instruction. I remember copying it from where we disable…
		mrs x2, sctlr_el2
		bic x2, x2, SCTLR_M
		msr sctlr_el2, x2
		alexandru.elisei_arm.comUnsubmitted Not Done Inline Actions I haven't been reading the code, but I haven't found where the dcache is invalidated before the MMU is turned on at EL1. Where is that done? alexandru.elisei_arm.com: I haven't been reading the code, but I haven't found where the dcache is invalidated before the…
		andrewAuthorUnsubmitted Done Inline Actions In the normal boot process we expect the bootloader to have invalidated the dcache andrew: In the normal boot process we expect the bootloader to have invalidated the dcache
		alexandru.elisei_arm.comUnsubmitted Not Done Inline Actions The patch itself looks alright to me, but I haven't tested it. As for why I think FreeBSD should do dcache invalidation before turning the MMU back on, wall of text below. I've been reading the Arm ARM ARM DDI 0487F.b trying to find out exactly what a CPU is allowed to speculate, and I've only found what is not able to speculate (section B2.3.6 "Restrictions on the effects of speculation"). From that I assume the CPU is able to speculate any loads regardless of the program order, and modify the dcache as a result. This gets somewhat confirmed by the definition of the SB (Speculation Barrier) instruction (page C6-1188): "In particular, any instruction that appears later in the program order than the barrier cannot cause a speculative allocation into any caching structure where the allocation of that entry could be indicative of any data value present in memory or in the registers." My interpretation of that statement is that without the barrier, the PE is allowed to modify the dcache as a result of speculation. This is the definition of Device memory from page B2-117: "The Arm architecture forbids Speculative reads of any type of Device memory. This means Device memory types are suitable attributes for read-sensitive Locations." When address translation is disabled, data accesses are to Device-nGnRnE memory type (page D5-2586). The point I am trying to make is that as long as the MMU is enabled (SCTLR_ELx.M == 1), the PE is allowed to speculate any loads and modify the dcache. This is something that FreeBSD allows the bootloader to do. The other side of the story is that even if FreeBSD starts execution with the MMU disabled, if it's running under a hypervisor, the dcaches might have been populated while the host was running on the same PE (either intentionally or as a result of speculation). As a result, dcache invalidation must be performed right before the MMU is turned on (after the last load in program order, to be more precise), so any speculative reads on the host side will populate the dcache with the latest values that the guest wrote to memory. I hope what I wrote above makes sense. alexandru.elisei_arm.com: The patch itself looks alright to me, but I haven't tested it. As for why I think FreeBSD…
		andrewAuthorUnsubmitted Done Inline Actions Which memory do you think needs d-cache management? As far as I can tell we should invalidate the d-cache for the page tables as there may be data for them in the cache, however we create them with the cache disabled. andrew: Which memory do you think needs d-cache management? As far as I can tell we should invalidate…
		isb

		/* Enable the HVC Instruction and Make EL1 aarch64 */
		ldr x2, hcr
msr hcr_el2, x2		msr hcr_el2, x2

/* Load the Virtualization Process ID Register */		/* Load the Virtualization Process ID Register */
mrs x2, midr_el1		mrs x2, midr_el1
msr vpidr_el2, x2		msr vpidr_el2, x2

/* Load the Virtualization Multiprocess ID Register */		/* Load the Virtualization Multiprocess ID Register */
mrs x2, mpidr_el1		mrs x2, mpidr_el1
Show All 13 Lines	1:
/* Enable access to the physical timers at EL1 */		/* Enable access to the physical timers at EL1 */
mrs x2, cnthctl_el2		mrs x2, cnthctl_el2
orr x2, x2, #(CNTHCTL_EL1PCTEN \| CNTHCTL_EL1PCEN)		orr x2, x2, #(CNTHCTL_EL1PCTEN \| CNTHCTL_EL1PCEN)
msr cnthctl_el2, x2		msr cnthctl_el2, x2

/* Set the counter offset to a known value */		/* Set the counter offset to a known value */
msr cntvoff_el2, xzr		msr cntvoff_el2, xzr

/* Hypervisor trap functions */		/* Install hypervisor trap functions */
adr x2, hyp_vectors		adrp x2, hyp_stub_vectors
		add x2, x2, :lo12:hyp_stub_vectors
msr vbar_el2, x2		msr vbar_el2, x2

		/* Use the host VTTBR_EL2 to tell the host and the guests apart */
		mov x2, #VTTBR_HOST
		msr vttbr_el2, x2

mov x2, #(PSR_F \| PSR_I \| PSR_A \| PSR_D \| PSR_M_EL1h)		mov x2, #(PSR_F \| PSR_I \| PSR_A \| PSR_D \| PSR_M_EL1h)
msr spsr_el2, x2		msr spsr_el2, x2

/* Configure GICv3 CPU interface */		/* Configure GICv3 CPU interface */
mrs x2, id_aa64pfr0_el1		mrs x2, id_aa64pfr0_el1
/* Extract GIC bits from the register */		/* Extract GIC bits from the register */
ubfx x2, x2, #ID_AA64PFR0_GIC_SHIFT, #ID_AA64PFR0_GIC_BITS		ubfx x2, x2, #ID_AA64PFR0_GIC_SHIFT, #ID_AA64PFR0_GIC_BITS
/* GIC[3:0] == 0001 - GIC CPU interface via special regs. supported */		/* GIC[3:0] == 0001 - GIC CPU interface via special regs. supported */
cmp x2, #(ID_AA64PFR0_GIC_CPUIF_EN >> ID_AA64PFR0_GIC_SHIFT)		cmp x2, #(ID_AA64PFR0_GIC_CPUIF_EN >> ID_AA64PFR0_GIC_SHIFT)
b.ne 2f		b.ne 2f

mrs x2, icc_sre_el2		mrs x2, icc_sre_el2
orr x2, x2, #ICC_SRE_EL2_EN /* Enable access from insecure EL1 */		orr x2, x2, #ICC_SRE_EL2_EN /* Enable access from insecure EL1 */
orr x2, x2, #ICC_SRE_EL2_SRE /* Enable system registers */		orr x2, x2, #ICC_SRE_EL2_SRE /* Enable system registers */
msr icc_sre_el2, x2		msr icc_sre_el2, x2
2:		2:

/* Set the address to return to our return address */		/* Set the address to return to our return address */
msr elr_el2, x30		msr elr_el2, x30
isb		isb
		alexandru.elisei_arm.comUnsubmitted Not Done Inline Actions According to the definition of a context synchronization event from ARM DDI 0487F.b, page Glossary-8112, the isb and eret instructions are equivalent. I believe this isb is redundant. alexandru.elisei_arm.com: According to the definition of a context synchronization event from ARM DDI 0487F.b, page…

eret		eret

.align 3		.align 3
.Lsctlr_res1:		.Lsctlr_res1:
.quad SCTLR_RES1		.quad SCTLR_RES1

#define VECT_EMPTY \		hcr:
.align 7; \		/* Make sure the HVC instruction is not disabled */
1: b 1b		.quad (HCR_RW & ~HCR_HCD)

.align 11
hyp_vectors:
VECT_EMPTY /* Synchronous EL2t */
VECT_EMPTY /* IRQ EL2t */
VECT_EMPTY /* FIQ EL2t */
VECT_EMPTY /* Error EL2t */

VECT_EMPTY /* Synchronous EL2h */
VECT_EMPTY /* IRQ EL2h */
VECT_EMPTY /* FIQ EL2h */
VECT_EMPTY /* Error EL2h */

VECT_EMPTY /* Synchronous 64-bit EL1 */
VECT_EMPTY /* IRQ 64-bit EL1 */
VECT_EMPTY /* FIQ 64-bit EL1 */
VECT_EMPTY /* Error 64-bit EL1 */

VECT_EMPTY /* Synchronous 32-bit EL1 */
VECT_EMPTY /* IRQ 32-bit EL1 */
VECT_EMPTY /* FIQ 32-bit EL1 */
VECT_EMPTY /* Error 32-bit EL1 */

/*		/*
* Get the delta between the physical address we were loaded to and the		* Get the delta between the physical address we were loaded to and the
* virtual address we expect to run from. This is used when building the		* virtual address we expect to run from. This is used when building the
* initial page table.		* initial page table.
*/		*/
get_virt_delta:		get_virt_delta:
/* Load the physical address of virt_map */		/* Load the physical address of virt_map */
▲ Show 20 Lines • Show All 510 Lines • Show Last 20 Lines