Page MenuHomeFreeBSD

arm64/vmm: Use the VNCR_EL2 memory page to store guest registers
Needs ReviewPublic

Authored by kajetan.puchalski_arm.com on Tue, Apr 21, 11:08 AM.
Tags
None
Referenced Files
F155018934: D56551.id.diff
Thu, Apr 30, 5:38 PM
Unknown Object (File)
Wed, Apr 29, 12:43 PM
Unknown Object (File)
Wed, Apr 29, 2:34 AM
Unknown Object (File)
Tue, Apr 28, 12:52 AM
Unknown Object (File)
Mon, Apr 27, 10:06 AM
Unknown Object (File)
Mon, Apr 27, 4:43 AM
Unknown Object (File)
Sun, Apr 26, 7:55 AM
Unknown Object (File)
Sun, Apr 26, 6:53 AM
Subscribers

Details

Reviewers
andrew
manu
markj
jhb
Group Reviewers
arm64
Summary

Wherever possible, move the storage space for guest register values from
the hypctx struct into a preallocated memory page matching the layout of
the page pointed to by VNCR_EL2.
This will streamline implementing support for nested virtualization, but
the implementation itself is not reliant on the presence of nested
virtualization architecture features.

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 72608
Build 69491: arc lint + arc unit

Event Timeline

This is really quite an obnoxious change to deal with for Morello, where some of these registers are 128-bit capabilities not 64-bit integers...

sys/arm64/vmm/arm64.h
264

Isn't this using a byte offset, but the enum values are consecutive?

This is really quite an obnoxious change to deal with for Morello, where some of these registers are 128-bit capabilities not 64-bit integers...

Which ones specifically?
Do you just mean it's harder to deal with because without this you'd just expand the sizes of registers in hypctx, whereas the VNCR layout doesn't have the same flexibility?

sys/arm64/vmm/arm64.h
264

Enum values for registers defined with VNCR(...) are just the value of VNCR_START + <offset within the VNCR page>. The offsets are not necessarily evenly spaced out, e.g.:

VNCR_VSTTBR_EL2 0x30
VNCR_VTCR_EL2   0x40
VNCR_VSTCR_EL2  0x48

The idea here is that for any VNCR register you can just get its enum value, subtract VNCR_START and it'll give you the offset that you need to add to the address stored in VNCR to get the memory location of said register.

This is really quite an obnoxious change to deal with for Morello, where some of these registers are 128-bit capabilities not 64-bit integers...

Which ones specifically?
Do you just mean it's harder to deal with because without this you'd just expand the sizes of registers in hypctx, whereas the VNCR layout doesn't have the same flexibility?

Yes, exactly. This loses all type system flexibility. At least if you just made a struct that matched the VNCR layout we could change the types, and ok it wouldn't then match, but that wouldn't matter for our use case (and if a CHERI + Arm + nested virt system ever exists then the layout can be made to match that). Whereas now we'd have to go and meticulously tweak all the magic VNCR constants to give us a layout that works, and tweak the one-function-to-rule-them-all to be two functions, one for integers and one for capabilities, since we have multiple types.

sys/arm64/vmm/arm64.h
264

Oh, I see, VNCR_REG hides all that. That's really quite weird, having the non-VNCR ones just count up from 0, whilst the VNCR ones count up in increments of 8. I don't think two different things should be in the same enum.

Yes, exactly. This loses all type system flexibility. At least if you just made a struct that matched the VNCR layout we could change the types, and ok it wouldn't then match, but that wouldn't matter for our use case (and if a CHERI + Arm + nested virt system ever exists then the layout can be made to match that). Whereas now we'd have to go and meticulously tweak all the magic VNCR constants to give us a layout that works, and tweak the one-function-to-rule-them-all to be two functions, one for integers and one for capabilities, since we have multiple types.

I know very little about CHERI & Morello, so I'm probably missing some context here. But if I understand correctly, you'd still need to add lots of CHERI-specific handling all over the hypervisor to make it work, the assumption that system registers are all uint64_t is baked into just about every piece of the logic.
However, it seems to me that you could adapt these changes fairly easily? For instance:

  1. Add CHERI-specific values to the big enum just before VNCR_START, e.g. CHERI_<name>
  2. Allocate an array of whatever type you need to store them
  3. In hypctx_sys_reg, add if (reg > CHERI_START && reg < VNCR_START) { call some macro to access said CHERI-specific registers }

It'll compile down to the same thing.

sys/arm64/vmm/arm64.h
264

It depends how you see it, the idea behind this is that the enum values are there to tell you where to find the storage space for a given register. In the case of non-VNCR registers that's the array index, in the case of VNCR registers it's the offset. But that's just an implementation detail and the user doesn't need to directly worry about it. They're in the same enum because they're all system registers.

The primary benefit of doing this type of enum sharing is that all of this gets completely compiled away. When you write hypctx_read_sys_reg(hypctx, REG) the compiler can statically determine whether you're asking for a VNCR or non-VNCR register, this entire function call gets compiled down to just a single mrs. If I separated the enums and the values would overlap then there'd be a need for either two separate functions or some runtime checks.

We will need to handle accessing different sized registers to support FEAT_D128, e.g. VTTBR_EL2 will be 128 bit there.

sys/arm64/vmm/arm64.h
47

I guess we could do something like this so all the values are 64-bit offsets

212

This and host_ctx_addr should be a pointer type

sys/arm64/vmm/vmm_arm64.c
568–569

It's probably best to merge these into the hypctx allocation, e.g. by adding them to the size el2_hypctx_size returns. For non-VHE we need to map them into the EL2 address space below with the call to el2_map_enter

Merge hypctx & VNCR into one allocation, adjust pointer type

kajetan.puchalski_arm.com added inline comments.
sys/arm64/vmm/arm64.h
212

I made host_ctx_addr into a pointer, this however gets folded into sys_regs in a later commit and is just the register value. In that case I think it's better to leave it as-is?

Encode VNCR values into enum hypctx_sysreg as 64-bit offsets

kajetan.puchalski_arm.com added inline comments.
sys/arm64/vmm/arm64.h
47

Sure, that works too. It'll get compiled away anyway.

We will need to handle accessing different sized registers to support FEAT_D128, e.g. VTTBR_EL2 will be 128 bit there.

For VNCR registers this would just be a matter of using a different accessor, all of the registers affected by FEAT_D128 already have 16 bytes of space left for them in the VNCR page layout.
For non-VNCR registers we could just add a second adjacent field in the sysregs enum, eg REG_EL2_2 or something similar and use a special accessor to make use of them both.
The mrrs and msrr instructions load and store two 64-bit GPRs, so this would get compiled pretty nicely.

It looks like the two capability registers this change affects are elr_el1 and vbar_el1. Both of these have at least 128 bits of space before the next register, so storage shouldn't be a problem for Morello.

sp_el0, tpidr_el0, tpidrro_el0, and tpidr_el1 could be split into a new capability array when they are managed in a later change.

sys/arm64/vmm/arm64.h
212

It's still an integer there & the value we pass to a register. It's not uncommon for RES0 fields to be used in later revisions of the spec breaking the assumption it can be used as a pointer.

sys/arm64/vmm/vmm_arm64.c
586–590

Can you add el2_vncr_addr and el2_host_ctx (or similar) & set them here? We will then need to use them in vmm_hyp.c in the non-VHE case.

Handle non-VHE, store the vncr page as a pointer

sys/arm64/vmm/vmm_arm64.c
569–570

uintptr_t is the correct integer type for a pointer.

sys/arm64/vmm/vmm_arm64.c
569–570

char * is the correct pointer type for when you're just doing pointer arithmetic and don't actually need an integer to mess with it beyond addition/subtraction. uintptr_t will work but unnecessarily exposes the provenance.

It looks like the two capability registers this change affects are elr_el1 and vbar_el1. Both of these have at least 128 bits of space before the next register, so storage shouldn't be a problem for Morello.

sp_el0, tpidr_el0, tpidrro_el0, and tpidr_el1 could be split into a new capability array when they are managed in a later change.

Also sp_el1 and tpidr_el2 in the VNCR space (why is tpidr_el2 even in there? I don't know if I want to know...). I would really like to not land this patch before it's clear there's an *easy* path to making this work on Morello. Breaking bhyve on Morello would be really quite unfortunate and hostile towards CheriBSD.

It looks like the two capability registers this change affects are elr_el1 and vbar_el1. Both of these have at least 128 bits of space before the next register, so storage shouldn't be a problem for Morello.

sp_el0, tpidr_el0, tpidrro_el0, and tpidr_el1 could be split into a new capability array when they are managed in a later change.

Also sp_el1 and tpidr_el2 in the VNCR space (why is tpidr_el2 even in there? I don't know if I want to know...). I would really like to not land this patch before it's clear there's an *easy* path to making this work on Morello. Breaking bhyve on Morello would be really quite unfortunate and hostile towards CheriBSD.

Does bhyve currently work on Morello?

It looks like the two capability registers this change affects are elr_el1 and vbar_el1. Both of these have at least 128 bits of space before the next register, so storage shouldn't be a problem for Morello.

sp_el0, tpidr_el0, tpidrro_el0, and tpidr_el1 could be split into a new capability array when they are managed in a later change.

Also sp_el1 and tpidr_el2 in the VNCR space (why is tpidr_el2 even in there? I don't know if I want to know...). I would really like to not land this patch before it's clear there's an *easy* path to making this work on Morello. Breaking bhyve on Morello would be really quite unfortunate and hostile towards CheriBSD.

Does bhyve currently work on Morello?

Yes. In fact it was merged to CheriBSD before upstream FreeBSD, and we found and fixed various bugs in the process of testing it on CheriBSD, which got upstreamed (some folded into the original FreeBSD commits as they were late enough).

And see for example D55860 as things we continue to find by using it in anger downstream. I think we are by far the biggest user to date of bhyve on any form of arm64.

Ah thanks, that's good to know! I'm new to all things BSD, always nice to get some additional context :)

I would really like to not land this patch before it's clear there's an *easy* path to making this work on Morello.

What's the remaining concern in that regard, then? Looking at the cheribsd github repo it already does a bunch of #if __has_feature(capabilities).
For any arbitrary register or group thereof, you can just add an if into hypctx_sys_reg and use whatever storage you like. In the VNCR page, in a different page, in the hypctx struct, whatever works.
It's specifically meant to be flexible in cases where a register needs "special" handling and needs to be put through a dedicated function and the like.

I would really like to not land this patch before it's clear there's an *easy* path to making this work on Morello.

What's the remaining concern in that regard, then? Looking at the cheribsd github repo it already does a bunch of #if __has_feature(capabilities).
For any arbitrary register or group thereof, you can just add an if into hypctx_sys_reg and use whatever storage you like. In the VNCR page, in a different page, in the hypctx struct, whatever works.
It's specifically meant to be flexible in cases where a register needs "special" handling and needs to be put through a dedicated function and the like.

As in we'd change the definition of the hypctx_sysreg enum itself? And then do "something" to special case it in the function? And have a new hypctx_sys_reg_cap that returned uintcap_t *? That seems like it could work. Though I've tagged @markj on this review as he did the initial adaptation of the proposed bhyve/arm64 code to Morello.

Also, you could get rid of the need for specific read/write functions if you had something like:

#define hypctx_sys_reg(hypctx, reg) (*_hypctx_sys_reg(hypctx, reg))

where _hypctx_sys_reg (name just a placeholder, but maybe ok) is what hypctx_sys_reg is today in your patch. Because then it behaves like an lvalue, so you can write all of:

foo = hypctx_sys_reg(hypctx, SP_EL1);
hypctx_sys_reg(hypctx, SP_EL1) = bar;
hypctx_sys_reg(hypctx, SP_EL1) |= ...;
return (&hypctx_sys_reg(hypctx, SP_EL1));
sys/arm64/vmm/arm64.h
263

This is fragile as it's sensitive to include order, and if anything other than vmm_[n]vhe.c try to use this or the related functions they'll get the wrong implementation. Ideally this would therefore be moved to vmm_vhe.c, made unconditional, and the inline functions moved to vmm_whatever.c that gets included and needs them. And if generic compiled-once code needs to use these functions then you need a version that does dynamic dispatched based on is_vhe or whatever the function is.

265

uint64_t is not a pointer type. Also missing a space before the *.

274
287

As in we'd change the definition of the hypctx_sysreg enum itself? And then do "something" to special case it in the function? And have a new hypctx_sys_reg_cap that returned uintcap_t *? That seems like it could work. Though I've tagged @markj on this review as he did the initial adaptation of the proposed bhyve/arm64 code to Morello.

Something like that, yes. You could probably even do some preprocessor voodoo and make hypctx_sys_reg and the read/write functions macros that instantiate to the current function for normal registers, and to your hypctx_sys_reg_cap with its different return type for the capability-enabled ones.
As I've mentioned previously, currently all these functions and macros get compiled away to bare mrs & msr instructions. The "return type" is just there so that the compiler finds the right instruction.

Also, you could get rid of the need for specific read/write functions if you had something like:

I wish.. The read/write functions are just cosmetic and don't do anything at the moment, but they'll be necessary to handle certain architectural edge cases that'll come up when implementing support for nested virt. I'm just adding them ahead of time.
The reason I'm trying to funnel all reads and writes through these functions is so that when corrections need to be made to register reads and writes they only get made in one place.

As in we'd change the definition of the hypctx_sysreg enum itself? And then do "something" to special case it in the function? And have a new hypctx_sys_reg_cap that returned uintcap_t *? That seems like it could work. Though I've tagged @markj on this review as he did the initial adaptation of the proposed bhyve/arm64 code to Morello.

Something like that, yes. You could probably even do some preprocessor voodoo and make hypctx_sys_reg and the read/write functions macros that instantiate to the current function for normal registers, and to your hypctx_sys_reg_cap with its different return type for the capability-enabled ones.
As I've mentioned previously, currently all these functions and macros get compiled away to bare mrs & msr instructions. The "return type" is just there so that the compiler finds the right instruction.

Also, you could get rid of the need for specific read/write functions if you had something like:

I wish.. The read/write functions are just cosmetic and don't do anything at the moment, but they'll be necessary to handle certain architectural edge cases that'll come up when implementing support for nested virt. I'm just adding them ahead of time.
The reason I'm trying to funnel all reads and writes through these functions is so that when corrections need to be made to register reads and writes they only get made in one place.

Then why are some places using *hypctx_sys_reg directly and mutating it? What kinds of corrections need to be made?

Then why are some places using *hypctx_sys_reg directly and mutating it?

I left these wherever the existing code was doing operations like "&=" etc to not inflate the diff on this series even higher since it's currently not an issue. But yeah ideally those should be changed over as well. Especially the ones outside of vmm_reset.c.

What kinds of corrections need to be made?

There are some registers which will need a mask to be applied to them, there are some that need to be mapped to a different register, some that need to have certain bits translated and reordered and the like.

It looks like the two capability registers this change affects are elr_el1 and vbar_el1. Both of these have at least 128 bits of space before the next register, so storage shouldn't be a problem for Morello.

sp_el0, tpidr_el0, tpidrro_el0, and tpidr_el1 could be split into a new capability array when they are managed in a later change.

Also sp_el1 and tpidr_el2 in the VNCR space (why is tpidr_el2 even in there? I don't know if I want to know...). I would really like to not land this patch before it's clear there's an *easy* path to making this work on Morello. Breaking bhyve on Morello would be really quite unfortunate and hostile towards CheriBSD.

Both of these have enough space for Morello. tpidr_el2 will be needed for nested virtulization so the guest can access when it thinks it's running at EL2 it without affecting the host.

Address remaining style review comments

I think this should be all outstanding issues sorted out now. To adapt this to CHERI you could do something as simple as:

diff --git a/sys/arm64/vmm/arm64.h b/sys/arm64/vmm/arm64.h
index 064bca4194f3..1706f61037c9 100644
--- a/sys/arm64/vmm/arm64.h
+++ b/sys/arm64/vmm/arm64.h
@@ -168,6 +168,9 @@ enum hypctx_sysreg {

        NR_NON_VNCR_REGS,

+       CAPABILITY_START,
+       CAPABILITY_REG1,       <-- put capability registers in the enum to distinguish them
+
        /* VNCR Registers */
        VNCR_START,

@@ -314,6 +317,8 @@ hypctx_sys_reg(struct hypctx *hypctx, int reg /* enum hypctx_sysreg  */)
 {
        if (reg > VNCR_START)
                return (__hypctx_vncr_sysreg(hypctx, reg));
+       if (reg > CAPABILITY_START)
+               return ((uint64_t *)&hypctx->capability_regs[reg]);         <-- store them wherever is convenient, make the pointer types agree one way or another
        return (&hypctx->sys_regs[reg]);
 }

@@ -330,6 +335,12 @@ hypctx_read_sys_reg(struct hypctx *hypctx, enum hypctx_sysreg reg)
        return (*hypctx_sys_reg(hypctx, reg));
 }

+static inline uintcap_t
+hypctx_read_cap_sys_reg(struct hypctx *hypctx, enum hypctx_sysreg reg)         <-- add separate read/write accessors for capability registers and use them
+{
+       return (*(uintcap_t*)hypctx_sys_reg(hypctx, reg));     <-- cast the pointer to your desired type
+}

Idk if there are any CHERI-specific reasons why you couldn't pass a pointer to your capability storage as a uint64_t* type and then cast it back prior to dereferencing it.
If so, you could make hypctx_sys_reg return a void* then cast it to the correct type separately in each of the accessors. But that'd require removing all direct accesses to hypctx_sys_reg relying on it being uint64_t*.

sys/arm64/vmm/arm64.h
263

It's like this on purpose, the alternative implementation of __hypctx_vncr_sysreg is only relevant specifically in non-VHE vmm_hyp.c. In VHE, or anywhere *outside of vmm_hyp.c* it is not applicable. Hence, if arm64.h gets included anywhere that doesn't have this macro already defined, that means the definition here is the correct one.

sys/arm64/vmm/vmm_arm64.c
568

This assumption should be checked with _Static_assert.