Page MenuHomeFreeBSD

libvmmapi: Add support for setting up and configuring guest NUMA domains
ClosedPublic

Authored by bnovkov on Mar 30 2024, 4:33 PM.
Tags
None
Referenced Files
F125045922: D44566.id143123.diff
Sat, Aug 2, 8:13 PM
Unknown Object (File)
Thu, Jul 31, 9:32 PM
Unknown Object (File)
Sun, Jul 27, 5:27 PM
Unknown Object (File)
Sun, Jul 27, 5:19 PM
Unknown Object (File)
Sat, Jul 26, 5:20 AM
Unknown Object (File)
Sat, Jul 26, 12:45 AM
Unknown Object (File)
Fri, Jul 25, 10:24 PM
Unknown Object (File)
Fri, Jul 25, 9:06 PM
Subscribers

Details

Summary

This patch reworks libvmmapi to provide support for emulating NUMA domains in guests.

More specifically, it reworks vm_setup_memory to setup system memory segments for each guest NUMA domain, adds per-domain CPU affinity tracking to struct vm_ctx, and adds two new routines for fetching and setting a domain's CPU set.

An emulated NUMA domain is described by a struct vmdom in vmmapi.h. Aside from its size in bytes, each domain can be configured to use a specific domainset(9) policy and domain mask.
vm_setup_memory now takes two additional arguments - an array of struct vmdoms and the array's size. It then proceeds to set up a memory segment for each specified domain using the existing memory mapping scheme. If no domain info is passed, the memory setup falls back to the original, non-NUMA behaviour.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

bnovkov retitled this revision from libvmmapi: Add interfaces for setting and getting VM NUMA configuration to libvmmapi: Add support for setting up and configuring guest NUMA domains.
bnovkov edited the summary of this revision. (Show Details)

Update patch and summary.

bnovkov edited the summary of this revision. (Show Details)

Update patch with support for handling arbitrary domainset(9) policies.

Fix an issue when mapping segments larger than VM_LOWMEM_LIMIT.

lib/libvmmapi/amd64/vmmapi_machdep.c
31 ↗(On Diff #157461)

This should be sorted after types.h (which is special) and before ioctl.h.

lib/libvmmapi/vmmapi.c
463
1263

This function does nothing but update libvmm internal state, based on the already-created memsegs. This state is only used to implement vm_get_domain_cpus(), which I think is only used when building the SRAT. What's the purposes of keeping track of these cpusets here? Isn't it just duplicating state that's already in the kernel?

bnovkov added inline comments.
lib/libvmmapi/vmmapi.c
1263

This function does nothing but update libvmm internal state, based on the already-created memsegs. This state is only used to implement vm_get_domain_cpus(), which I think is only used when building the SRAT. What's the purposes of keeping track of these cpusets here? Isn't it just duplicating state that's already in the kernel?

They're here just for building the SRAT. AFAICT the kernel doesn't keep track of affinities (only topology information), and ISTR that we discussed this already in an earlier iteration of this patch and concluded that this info should be kept out of the kernel.
The problem here is that acpi_build only passes the struct vmctx to its internal functions (and I didn't want to change its signature).

lib/libvmmapi/vmmapi.c
1220

Hmm, I'm kind of wary using a cpuset_t on a library boundary. The size isn't really fixed, CPU_SETSIZE might change again someday, as it has recently (256->1024).

Does it make sense to have a function which sets the domain for a single vCPU, rather than a whole set? Then the caller can just do something like

CPU_FOREACH_ISSET(cpu, &cpus)
    vm_set_domain_cpu(ctx, domain, cpu);

but that doesn't work so well for vm_get_domain_cpus(), hmm.

1263

It's a bit gross, but, can we just have the mappings exist as global variables? Similar to how we have guest_ncpus and the topology variables in bhyverun.h.

(But I don't understand why you don't want to change acpi_build()? That seems less invasive than adding new library interfaces.)

lib/libvmmapi/vmmapi.h
67–69

Since this is a public structure, it'd be nice to give it a name that's consistent with the rest of the namespace (vm_*), maybe vm_mem_domain or similar?

Address @markj 's comments:

  • Remove cpu affinity tracking
lib/libvmmapi/vmmapi.c
1263

I think my initial motivation was to avoid coupling this mechanism to bhyve itself and have it be available to other consumers through libvmmapi

However, this doesn't make much sense if there's only one consumer using these mappings, so I've decided to track these mappings in bhyve's acpi.c instead (see the latest changes in D44567). This simplifies things greatly since we no longer need to track mappings in libvmmapi.

lib/libvmmapi/vmmapi.c
417

I think some more details would be useful here:

  • How do the domains get laid out in the guest physical address space?
  • There is a memsize parameter, plus each domain has a size field. Is it expected that the sum of the domain sizes equals memsize? What happens if it doesn't?
439

This looks like it logically belongs in vm_setup_memory().

lib/libvmmapi/vmmapi.h
67

This comment can be a single line.

70

Embedding a domainset_t in a public interface is a bit sketchy: if the structure size changes, the ABI will be broken. But for userspace we use a large set size (256), which should be ok for the foreseeable future...

Have you tested this on arm64? Please give it a try on snow1.

bnovkov marked 3 inline comments as done.

Address @markj 's comments and fix an edge case where VM_LOWMEM_LIMIT == 0 .

lib/libvmmapi/vmmapi.c
417

Hm, memsize is ultimately unnecessary since it was just used to calculate the size of the mmaped region, I've removed it from the arguments.

I've also left a few comments about the memory layout.

lib/libvmmapi/vmmapi.h
70

Hm, looking at the whole call chain we end up passing a pointer to the set anyway.
I'll switch this to a pointer and update the bhyve code.

Looks ok to me, just minor comments.

lib/libvmmapi/vmmapi.c
444

You may want to check that all of these sizes are a multiple of the page size. (Or does the kernel verify that?)

lib/libvmmapi/vmmapi.h
70

If you do that the caller needs to pass sizeof(domainset_t) as well. Right now the domainset is coming from the libvmmapi.so consumer, but libvmmapi.so itself is computing sizeof(domainset_t). So, if the maximum size is updated and libvmmapi.so is updated, but the consumer is not, things might go wrong.

Since libvmmapi.so itself doesn't care about the domainset (though that might change someday?), it'd be easy to just pass the domainset size together with the pointer.

121

Please keep the indentation consistent.

This revision is now accepted and ready to land.Sun, Jul 27, 1:26 PM