Page MenuHomeFreeBSD

arm64/vmm: Use the VNCR_EL2 memory page to store guest registers
Needs ReviewPublic

Authored by kajetan.puchalski_arm.com on Tue, Apr 21, 11:08 AM.

Details

Reviewers
andrew
manu
markj
jhb
Group Reviewers
arm64
Summary

Wherever possible, move the storage space for guest register values from
the hypctx struct into a preallocated memory page matching the layout of
the page pointed to by VNCR_EL2.
This will streamline implementing support for nested virtualization, but
the implementation itself is not reliant on the presence of nested
virtualization architecture features.

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 72441
Build 69324: arc lint + arc unit

Event Timeline

This is really quite an obnoxious change to deal with for Morello, where some of these registers are 128-bit capabilities not 64-bit integers...

sys/arm64/vmm/arm64.h
258

Isn't this using a byte offset, but the enum values are consecutive?

This is really quite an obnoxious change to deal with for Morello, where some of these registers are 128-bit capabilities not 64-bit integers...

Which ones specifically?
Do you just mean it's harder to deal with because without this you'd just expand the sizes of registers in hypctx, whereas the VNCR layout doesn't have the same flexibility?

sys/arm64/vmm/arm64.h
258

Enum values for registers defined with VNCR(...) are just the value of VNCR_START + <offset within the VNCR page>. The offsets are not necessarily evenly spaced out, e.g.:

VNCR_VSTTBR_EL2 0x30
VNCR_VTCR_EL2   0x40
VNCR_VSTCR_EL2  0x48

The idea here is that for any VNCR register you can just get its enum value, subtract VNCR_START and it'll give you the offset that you need to add to the address stored in VNCR to get the memory location of said register.

This is really quite an obnoxious change to deal with for Morello, where some of these registers are 128-bit capabilities not 64-bit integers...

Which ones specifically?
Do you just mean it's harder to deal with because without this you'd just expand the sizes of registers in hypctx, whereas the VNCR layout doesn't have the same flexibility?

Yes, exactly. This loses all type system flexibility. At least if you just made a struct that matched the VNCR layout we could change the types, and ok it wouldn't then match, but that wouldn't matter for our use case (and if a CHERI + Arm + nested virt system ever exists then the layout can be made to match that). Whereas now we'd have to go and meticulously tweak all the magic VNCR constants to give us a layout that works, and tweak the one-function-to-rule-them-all to be two functions, one for integers and one for capabilities, since we have multiple types.

sys/arm64/vmm/arm64.h
258

Oh, I see, VNCR_REG hides all that. That's really quite weird, having the non-VNCR ones just count up from 0, whilst the VNCR ones count up in increments of 8. I don't think two different things should be in the same enum.

Yes, exactly. This loses all type system flexibility. At least if you just made a struct that matched the VNCR layout we could change the types, and ok it wouldn't then match, but that wouldn't matter for our use case (and if a CHERI + Arm + nested virt system ever exists then the layout can be made to match that). Whereas now we'd have to go and meticulously tweak all the magic VNCR constants to give us a layout that works, and tweak the one-function-to-rule-them-all to be two functions, one for integers and one for capabilities, since we have multiple types.

I know very little about CHERI & Morello, so I'm probably missing some context here. But if I understand correctly, you'd still need to add lots of CHERI-specific handling all over the hypervisor to make it work, the assumption that system registers are all uint64_t is baked into just about every piece of the logic.
However, it seems to me that you could adapt these changes fairly easily? For instance:

  1. Add CHERI-specific values to the big enum just before VNCR_START, e.g. CHERI_<name>
  2. Allocate an array of whatever type you need to store them
  3. In hypctx_sys_reg, add if (reg > CHERI_START && reg < VNCR_START) { call some macro to access said CHERI-specific registers }

It'll compile down to the same thing.

sys/arm64/vmm/arm64.h
258

It depends how you see it, the idea behind this is that the enum values are there to tell you where to find the storage space for a given register. In the case of non-VNCR registers that's the array index, in the case of VNCR registers it's the offset. But that's just an implementation detail and the user doesn't need to directly worry about it. They're in the same enum because they're all system registers.

The primary benefit of doing this type of enum sharing is that all of this gets completely compiled away. When you write hypctx_read_sys_reg(hypctx, REG) the compiler can statically determine whether you're asking for a VNCR or non-VNCR register, this entire function call gets compiled down to just a single mrs. If I separated the enums and the values would overlap then there'd be a need for either two separate functions or some runtime checks.

We will need to handle accessing different sized registers to support FEAT_D128, e.g. VTTBR_EL2 will be 128 bit there.

sys/arm64/vmm/arm64.h
47

I guess we could do something like this so all the values are 64-bit offsets

212

This and host_ctx_addr should be a pointer type

sys/arm64/vmm/vmm_arm64.c
568–569

It's probably best to merge these into the hypctx allocation, e.g. by adding them to the size el2_hypctx_size returns. For non-VHE we need to map them into the EL2 address space below with the call to el2_map_enter

Merge hypctx & VNCR into one allocation, adjust pointer type

kajetan.puchalski_arm.com added inline comments.
sys/arm64/vmm/arm64.h
212

I made host_ctx_addr into a pointer, this however gets folded into sys_regs in a later commit and is just the register value. In that case I think it's better to leave it as-is?

Encode VNCR values into enum hypctx_sysreg as 64-bit offsets

kajetan.puchalski_arm.com added inline comments.
sys/arm64/vmm/arm64.h
47

Sure, that works too. It'll get compiled away anyway.

We will need to handle accessing different sized registers to support FEAT_D128, e.g. VTTBR_EL2 will be 128 bit there.

For VNCR registers this would just be a matter of using a different accessor, all of the registers affected by FEAT_D128 already have 16 bytes of space left for them in the VNCR page layout.
For non-VNCR registers we could just add a second adjacent field in the sysregs enum, eg REG_EL2_2 or something similar and use a special accessor to make use of them both.
The mrrs and msrr instructions load and store two 64-bit GPRs, so this would get compiled pretty nicely.

It looks like the two capability registers this change affects are elr_el1 and vbar_el1. Both of these have at least 128 bits of space before the next register, so storage shouldn't be a problem for Morello.

sp_el0, tpidr_el0, tpidrro_el0, and tpidr_el1 could be split into a new capability array when they are managed in a later change.

sys/arm64/vmm/arm64.h
212

It's still an integer there & the value we pass to a register. It's not uncommon for RES0 fields to be used in later revisions of the spec breaking the assumption it can be used as a pointer.

sys/arm64/vmm/vmm_arm64.c
586–587

Can you add el2_vncr_addr and el2_host_ctx (or similar) & set them here? We will then need to use them in vmm_hyp.c in the non-VHE case.

Handle non-VHE, store the vncr page as a pointer

sys/arm64/vmm/vmm_arm64.c
569–570

uintptr_t is the correct integer type for a pointer.

sys/arm64/vmm/vmm_arm64.c
569–570

char * is the correct pointer type for when you're just doing pointer arithmetic and don't actually need an integer to mess with it beyond addition/subtraction. uintptr_t will work but unnecessarily exposes the provenance.

It looks like the two capability registers this change affects are elr_el1 and vbar_el1. Both of these have at least 128 bits of space before the next register, so storage shouldn't be a problem for Morello.

sp_el0, tpidr_el0, tpidrro_el0, and tpidr_el1 could be split into a new capability array when they are managed in a later change.

Also sp_el1 and tpidr_el2 in the VNCR space (why is tpidr_el2 even in there? I don't know if I want to know...). I would really like to not land this patch before it's clear there's an *easy* path to making this work on Morello. Breaking bhyve on Morello would be really quite unfortunate and hostile towards CheriBSD.

It looks like the two capability registers this change affects are elr_el1 and vbar_el1. Both of these have at least 128 bits of space before the next register, so storage shouldn't be a problem for Morello.

sp_el0, tpidr_el0, tpidrro_el0, and tpidr_el1 could be split into a new capability array when they are managed in a later change.

Also sp_el1 and tpidr_el2 in the VNCR space (why is tpidr_el2 even in there? I don't know if I want to know...). I would really like to not land this patch before it's clear there's an *easy* path to making this work on Morello. Breaking bhyve on Morello would be really quite unfortunate and hostile towards CheriBSD.

Does bhyve currently work on Morello?

It looks like the two capability registers this change affects are elr_el1 and vbar_el1. Both of these have at least 128 bits of space before the next register, so storage shouldn't be a problem for Morello.

sp_el0, tpidr_el0, tpidrro_el0, and tpidr_el1 could be split into a new capability array when they are managed in a later change.

Also sp_el1 and tpidr_el2 in the VNCR space (why is tpidr_el2 even in there? I don't know if I want to know...). I would really like to not land this patch before it's clear there's an *easy* path to making this work on Morello. Breaking bhyve on Morello would be really quite unfortunate and hostile towards CheriBSD.

Does bhyve currently work on Morello?

Yes. In fact it was merged to CheriBSD before upstream FreeBSD, and we found and fixed various bugs in the process of testing it on CheriBSD, which got upstreamed (some folded into the original FreeBSD commits as they were late enough).

And see for example D55860 as things we continue to find by using it in anger downstream. I think we are by far the biggest user to date of bhyve on any form of arm64.

Ah thanks, that's good to know! I'm new to all things BSD, always nice to get some additional context :)

I would really like to not land this patch before it's clear there's an *easy* path to making this work on Morello.

What's the remaining concern in that regard, then? Looking at the cheribsd github repo it already does a bunch of #if __has_feature(capabilities).
For any arbitrary register or group thereof, you can just add an if into hypctx_sys_reg and use whatever storage you like. In the VNCR page, in a different page, in the hypctx struct, whatever works.
It's specifically meant to be flexible in cases where a register needs "special" handling and needs to be put through a dedicated function and the like.

I would really like to not land this patch before it's clear there's an *easy* path to making this work on Morello.

What's the remaining concern in that regard, then? Looking at the cheribsd github repo it already does a bunch of #if __has_feature(capabilities).
For any arbitrary register or group thereof, you can just add an if into hypctx_sys_reg and use whatever storage you like. In the VNCR page, in a different page, in the hypctx struct, whatever works.
It's specifically meant to be flexible in cases where a register needs "special" handling and needs to be put through a dedicated function and the like.

As in we'd change the definition of the hypctx_sysreg enum itself? And then do "something" to special case it in the function? And have a new hypctx_sys_reg_cap that returned uintcap_t *? That seems like it could work. Though I've tagged @markj on this review as he did the initial adaptation of the proposed bhyve/arm64 code to Morello.

Also, you could get rid of the need for specific read/write functions if you had something like:

#define hypctx_sys_reg(hypctx, reg) (*_hypctx_sys_reg(hypctx, reg))

where _hypctx_sys_reg (name just a placeholder, but maybe ok) is what hypctx_sys_reg is today in your patch. Because then it behaves like an lvalue, so you can write all of:

foo = hypctx_sys_reg(hypctx, SP_EL1);
hypctx_sys_reg(hypctx, SP_EL1) = bar;
hypctx_sys_reg(hypctx, SP_EL1) |= ...;
return (&hypctx_sys_reg(hypctx, SP_EL1));
sys/arm64/vmm/arm64.h
257

This is fragile as it's sensitive to include order, and if anything other than vmm_[n]vhe.c try to use this or the related functions they'll get the wrong implementation. Ideally this would therefore be moved to vmm_vhe.c, made unconditional, and the inline functions moved to vmm_whatever.c that gets included and needs them. And if generic compiled-once code needs to use these functions then you need a version that does dynamic dispatched based on is_vhe or whatever the function is.

259

uint64_t is not a pointer type. Also missing a space before the *.

268
281

As in we'd change the definition of the hypctx_sysreg enum itself? And then do "something" to special case it in the function? And have a new hypctx_sys_reg_cap that returned uintcap_t *? That seems like it could work. Though I've tagged @markj on this review as he did the initial adaptation of the proposed bhyve/arm64 code to Morello.

Something like that, yes. You could probably even do some preprocessor voodoo and make hypctx_sys_reg and the read/write functions macros that instantiate to the current function for normal registers, and to your hypctx_sys_reg_cap with its different return type for the capability-enabled ones.
As I've mentioned previously, currently all these functions and macros get compiled away to bare mrs & msr instructions. The "return type" is just there so that the compiler finds the right instruction.

Also, you could get rid of the need for specific read/write functions if you had something like:

I wish.. The read/write functions are just cosmetic and don't do anything at the moment, but they'll be necessary to handle certain architectural edge cases that'll come up when implementing support for nested virt. I'm just adding them ahead of time.
The reason I'm trying to funnel all reads and writes through these functions is so that when corrections need to be made to register reads and writes they only get made in one place.

As in we'd change the definition of the hypctx_sysreg enum itself? And then do "something" to special case it in the function? And have a new hypctx_sys_reg_cap that returned uintcap_t *? That seems like it could work. Though I've tagged @markj on this review as he did the initial adaptation of the proposed bhyve/arm64 code to Morello.

Something like that, yes. You could probably even do some preprocessor voodoo and make hypctx_sys_reg and the read/write functions macros that instantiate to the current function for normal registers, and to your hypctx_sys_reg_cap with its different return type for the capability-enabled ones.
As I've mentioned previously, currently all these functions and macros get compiled away to bare mrs & msr instructions. The "return type" is just there so that the compiler finds the right instruction.

Also, you could get rid of the need for specific read/write functions if you had something like:

I wish.. The read/write functions are just cosmetic and don't do anything at the moment, but they'll be necessary to handle certain architectural edge cases that'll come up when implementing support for nested virt. I'm just adding them ahead of time.
The reason I'm trying to funnel all reads and writes through these functions is so that when corrections need to be made to register reads and writes they only get made in one place.

Then why are some places using *hypctx_sys_reg directly and mutating it? What kinds of corrections need to be made?

Then why are some places using *hypctx_sys_reg directly and mutating it?

I left these wherever the existing code was doing operations like "&=" etc to not inflate the diff on this series even higher since it's currently not an issue. But yeah ideally those should be changed over as well. Especially the ones outside of vmm_reset.c.

What kinds of corrections need to be made?

There are some registers which will need a mask to be applied to them, there are some that need to be mapped to a different register, some that need to have certain bits translated and reordered and the like.