Page MenuHomeFreeBSD

Bhyve acpi MADT table correction for VM_MAXCPU > 21
ClosedPublic

Authored by rgrimes on Jan 11 2019, 1:08 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 10, 7:06 AM
Unknown Object (File)
Fri, Jan 10, 6:14 AM
Unknown Object (File)
Thu, Jan 9, 10:22 PM
Unknown Object (File)
Sun, Jan 5, 7:56 AM
Unknown Object (File)
Tue, Dec 31, 12:18 PM
Unknown Object (File)
Dec 22 2024, 8:16 AM
Unknown Object (File)
Dec 8 2024, 11:59 PM
Unknown Object (File)
Nov 1 2024, 3:40 AM

Details

Summary

The bhyve acpi MADT table was given a static space of 256 (0x100) bytes, this is enough space to allow VM_MAXCPU to be 21, this patch fixes that so VM_MAXCPU can be of arbitrary value and not overflow the space by actually calculating the space needed for the table.
This is a fix for PR212782

Test Plan

Compile vmm.ko, and related code (libvmmapi, bhyve and bhyveload) with VM_MAXCPU=24 and boot a VM, note that the panic in virtualization maillist no longer occurs

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

rgrimes edited the test plan for this revision. (Show Details)
araujo requested changes to this revision.Jan 15 2019, 12:19 AM

It doesn't looks quite right! I need more time to articulate better arguments, but this patch as it is, must not be committed in.

usr.sbin/bhyve/acpi.c
80 ↗(On Diff #52754)

Gratuitous change that doesn't add anything. Mind let it be like was before?

This revision now requires changes to proceed.Jan 15 2019, 12:19 AM
usr.sbin/bhyve/acpi.c
87 ↗(On Diff #52754)

Document this magic formula, that information shall be needed later when this whole table is built at run time.

I have applied these related VM_MAXCPU patches to 12.0R and rebuilt the related components.

https://reviews.freebsd.org/D18815
https://reviews.freebsd.org/D18816
https://reviews.freebsd.org/D18846
https://reviews.freebsd.org/D18755

It appears we have an ACPI limit of 65 sockets that is unrelated to the VM_MAXCPU limit.

Using a mixture of sockets, cores, and threads such as:

"-c sockets=1,cores=255,threads=1" or "-c sockets=8,cores=8,threads=2"

I could reach 255 vCPUS with FreeBSD 10.4, 11.1R, and 12.0R:

ACPI APIC Table: <BHYVE BVMADT >
FreeBSD/SMP: Multiprocessor System Detected: 255 CPUs
FreeBSD/SMP: 1 package(s) x 255 core(s)

sysctl hw.ncpu

hw.ncpu: 255

I could reach 192 vCPUs with NetBSD, CentOS 7, and Debian 9.4.0, and 64 vCPUs with OpenBSD as hard-defined:

/arch/amd64/include/cpu.h:#define MAXCPUS 64 /* bitmask */

NetBSD and Linux would drop to 1 CPU when their limit is exceeded while FreeBSD will throw errors: (Not sure which behavior is preferred.)

With "sockets=1,cores=257,threads=1"
Booting /tmp/bhyve.BHl9LfJ 1812: [0001] Processor ID : 100

                                                                        Error    6303 -     Integer too large for target ^
(0000000000000100 - max 1 bytes)

                                                                           /tmp/bhyve.BHl9LfJ   1813: [0001]         Local Apic ID : 100
                                                         Error    6303 -      Integer too large for target ^ (0000000000000100 - max 1 bytes)

                                                              Assertion failed: (error == 0), function main, file /usr/src/usr.sbin/bhyve/bhyverun.c, line 1115.
Abort trap (core dumped)

This reminds us that we have a wrapping issue on our error messages that are unrelated to these patches.

I will continue testing but thus far, the 255 vCPU limit appears from bhyve's construction of the ACPI table, which is external to these patches.

Please send your testing ideas, especially related to the concerns raised.

To use these patches, first increase VM_MAXCPU from 16 in:

/usr/src/sys/amd64/include/vmm.h
/usr/include/machine/vmm.h

I used 1024 in my testing.

My my Phabricator has its quirks. Sorry for the extra message.

I did a quick MAXCPU increase test on 12.0R based on comments by cpersiva@ but the results were neither positive nor directly related to these patches.

usr.sbin/bhyve/acpi.c
42 ↗(On Diff #52754)

We need to update this table. In particular, you cannot overflow beyond the 1MB boundary. Legacy BIOS boot loaders assume that RAM starting at 1MB is free and will overwrite it. FreeBSD's boot loader does this (though bhyveload works around this by accident). FreeBSD's kernel is always located at 2MB (PAE/amd64) or 4MB (i386), though older FreeBSD kernels ran at 1MB (early 4.x). Other OS's may make similar assumptions (e.g. Linux). EFI doesn't suffer from this as EFI provides an explicitly memory map and manages all of RAM. However, for the non-EFI case (which is what these tables are for), we can't overflow the 1MB boundary. We should probably just pick a vCPU limit for non-EFI booting that keeps us under 1MB and tell folks who want more CPUs to use EFI.

80 ↗(On Diff #52754)

Well, size constants are being added, but the new comment does need to be wrapped properly. (I use M-q in Emacs to rewrap comments)

rgrimes marked an inline comment as done and an inline comment as not done.Feb 19 2019, 5:55 PM
rgrimes added inline comments.
usr.sbin/bhyve/acpi.c
42 ↗(On Diff #52754)

This is a static table for now at compile time, I shall add a CTASSERT that makes sure we do not over flow the space reserved for it. When this table is built at runtime that well need to become a runtime check.

To use these patches, first increase VM_MAXCPU from 16 in:

/usr/src/sys/amd64/include/vmm.h
/usr/include/machine/vmm.h

I used 1024 in my testing.

My my Phabricator has its quirks. Sorry for the extra message.

You can't usefully go above 255 CPUs without a lot more work. The MADT table requires separate entry types for APIC IDs > 254 (the field is 8 bits). However, there is a much larger issue beyond that which is that interrupts are routed with 8 bit APIC IDs (0 - 254 as well) in I/O APIC interrupt pin registers and MSI interrupt data fields. To deal with APIC IDs > 254, an I/O MMU is required that includes an interrupt remapping table that maps (8-bit APIC ID, IDT vector) -> (32-bit x2APIC ID, IDT vector). Any OS that can support more than 255 CPUs such as FreeBSD or Linux will require this. This means we would need to provide an emulated I/O MMU as part of bhyve for guest OS's to use. This represents a fairly large bit of work and is just the way x86 works. In addition, the work is more complicated than just remapping cookies as I'm not sure how it interacts with hardware virtualization exceptions like interrupt posting that try to bypass the VMM and inject pass-through interrupts directly into a guest. It's probably doable, but it's a very nontrivial amount of work, and is probably not worth doing until there's a compelling use case for more than 255 CPUs. Especially since bhyve doesn't handle cases where the number of vCPUs is > number of cores very well.

Wrap comment text, provide details of MADT_SIZE formula

rgrimes added inline comments.
usr.sbin/bhyve/acpi.c
42 ↗(On Diff #52754)

The total size of this table is on the order of a few kbytes, it has room for 54kbytes, I have added a max of (254 * 8)==2032 bytes, the table is no more in danger of overflowing now than it was before.

The potential overflow bug existed before, though near impossible to occur.

However I shall in the future when this ACPI table generation gets properly rewritten include a check for overflow of the 1MB limit as I plan to remove all these hard coded
constants and use the sizes of the objects returned by the iasl compiler as I am pretty sure some of them are out of date, or just "plenty big" for the objects stored in them.

There is no reason to just pick a maxcpu limit for the ACPI case, the code has already been tested to 4 x the limit that we are going to have to impose because of the APIC id being 8 bit.

It doesn't looks quite right! I need more time to articulate better arguments, but this patch as it is, must not be committed in.

Please comment and provide a reason for blocking this, or remove your blocking status.
Thanks,
Rod

usr.sbin/bhyve/acpi.c
42 ↗(On Diff #52754)

Ok, that sounds good. I think we just want bhyve to exit with a suitable error (you have too many CPUs or something else) if we generate a DSDT that overflows 1MB.

This revision is now accepted and ready to land.Feb 22 2019, 1:20 AM
pmooney_pfmooney.com added inline comments.
usr.sbin/bhyve/acpi.c
87 ↗(On Diff #52754)

Much appreciated reference material

This revision was automatically updated to reflect the committed changes.