Page MenuHomeFreeBSD

Bhyve: fix SMBIOS Type 17 table generation
ClosedPublic

Authored by bcran on Wed, Mar 18, 2:13 AM.

Details

Summary

Bhyve: fix SMBIOS Type 17 table generation

According to the SMBIOS specification (revision 2.7 or newer), the
extended module size field should only be used for sizes that can't
fit in the older size field.

Test Plan

Tested with FreeBSD: it now detects the correct amount of memory.
Also tested with Linux and Windows (Server 2008, 2012, 2016 and 2019).

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

bcran created this revision.Wed, Mar 18, 2:13 AM
bcran updated this revision to Diff 69619.Wed, Mar 18, 2:13 AM

Remove unrelated changes.

bcran retitled this revision from Bhyve: log message when rfb client connects to Bhyve: fix SMBIOS Type 17 table generation.Wed, Mar 18, 2:14 AM
bcran edited the summary of this revision. (Show Details)
grehan accepted this revision.Wed, Mar 18, 3:36 AM
grehan added a subscriber: grehan.

Looks good.

You might want to see what happens with > 64GB (or maybe 128GB, can't recall the value). At some point additional tables are required for Windows.

This revision is now accepted and ready to land.Wed, Mar 18, 3:36 AM
bcran added a comment.Wed, Mar 18, 6:55 PM

You might want to see what happens with > 64GB (or maybe 128GB, can't recall the value). At some point additional tables are required for Windows.

That sounds like something for a different patch, but I'll certainly plan to test that: the biggest machine I have at home has 128GB RAM, but I see https://www.packet.com/ have machines with up to 384GB I can use.

bcran added a comment.Fri, Mar 20, 5:57 PM

You might want to see what happens with > 64GB (or maybe 128GB, can't recall the value). At some point additional tables are required for Windows.

Ah, it's 32GB that's the limit: Windows Server 2008 R2 doesn't recognize more than that, I'm presuming because it doesn't know about the xsize field that was introduced around 2009 to support larger DIMMs.
I plan to work on a change that splits RAM up into 32GB sized DIMMs to work around that. At some point we might want to introduce an OS or SMBIOS version flag to control features like that.

In looking at the SMBios 2.7 change list in DSP01342.7.1.pdf (final version of 2.7) on page 134 There is a memory device (type 17) update that you also need to implement to allow DIMM's >32GB. In my work on VM_MAXCPU's I found it would actually be desirable at some point to get to 3.X version of some form, can not recall exactly what the minimum was right now, but reading a newer change log I see some interesting things like in 3.0.0 add GUID value for discovering SMBIOS tables in UEFI, which iirc would help our UEFI situation, add new entry for extended BIOS ROM size, oh here it is 3.0.0 Type 4, extned core, core eabled and thread count ranges. Looking at the change list in the 3.3.0 spec between 2.6 and 3.0.0 is not a huge task as much of it does not apply to bhyve.

But I think to bump so 27 you must implement the device type 17 change with this one, which fixes your current issue any way.

usr.sbin/bhyve/smbiostbl.c
766 ↗(On Diff #69619)

We need to be very careful about just bumping this because we added *some* feature from a new version, we must be sure the whole of the SMBios implementation in bhyve is conformat with that version. Luckly the difference between 2.6 and 2.7 is minor, but before this bumps we should look at all the changes to make sure they are implemented.

Argg.. you did at the new field. Hum, I am reading 3.3.0 type 17 now.

rgrimes requested changes to this revision.Fri, Mar 20, 7:12 PM

This code should work without a need to split memory up into 32GB Dimms

usr.sbin/bhyve/smbiostbl.c
258 ↗(On Diff #69619)

Comment is miss leading, this is size in kilobytes or megabytes depending on MSB.

708 ↗(On Diff #69619)

This calculation seems odd, I see later it is overwritten for large memory, cant we just continue to use extended all the time and leave this set at 0x7fff? Never mind, spec says not. Ok how about doing:
if (memsize <= (32GB-1MB)) {

  type17->size = size_MB; } else {
 type17->size = 0x7FFF;
 if (guest_himem <= 0 {
  #error should never happen
 } else {
fall into the code below, remove the if, we just did that with an error check

This would make it follow how the spec is written

716 ↗(On Diff #69619)

Should this be >= to, as if it is exactly 0x7fff, aka 32GB-1mb you'll end up with a memory size of 0.

718 ↗(On Diff #69619)

There is no check for truncation here

This revision now requires changes to proceed.Fri, Mar 20, 7:12 PM

add GUID value for discovering SMBIOS tables in UEFI, which iirc would help our UEFI situation

bhyve/UEFI has always searched for the SMBIOS table in memory and used that.

jhb added a subscriber: jhb.Thu, Mar 26, 6:10 PM
jhb added inline comments.
usr.sbin/bhyve/smbiostbl.c
708 ↗(On Diff #69619)

It is not overwritten as we allocate a separate type 17 entry for guest_himem, however, it does seem odd that we set the entry size for lomem to the full size instead of just the lomem size as we did before. That is, it seems we now create two entries for the same logical "DIMM". One holds the truncated size and one holds the extended size if we truncated, otherwise it holds zero, but we only create this second DIMM if guest_himem is > 0? I would expect us to do 1 of two things:

  1. 1 DIMM for both guest_lomem + guest_himem. (This closer to real hardware in practice)
  2. 1 DIMM for guest_lomem and 1 DIMM for guest_himem (what the older code does).
766 ↗(On Diff #69619)

If anything, it seems to be the other way around. We already define 2.7 fields in other types already (e.g. type 3 and types 16+19 for which we hardcode 2.7+ only encodings). I suspect we chose a conservative version number here to appease older guests previously.

This calculation seems odd, I see later it is overwritten for large memory, cant we just continue to use extended all the time and leave this set at 0x7fff? Never mind, spec says not. Ok how about doing:

Sorry, I only just saw the more recent comments. The problem is that older operating systems don't know to look at the extended field, and only ever look at the field which has a maximum value of 32GB.

bcran added inline comments.Thu, Mar 26, 11:36 PM
usr.sbin/bhyve/smbiostbl.c
766 ↗(On Diff #69619)

Ah, okay. That makes sense actually: we can define fields that are only defined by a newer version of the spec because systems which only know about the version we specify won't know to look at the newer fields, and shouldn't get confused.

bcran added inline comments.Thu, Mar 26, 11:56 PM
usr.sbin/bhyve/smbiostbl.c
718 ↗(On Diff #69619)

Thanks, I'll correct those issues.

bcran added inline comments.Fri, Mar 27, 12:15 AM
usr.sbin/bhyve/smbiostbl.c
766 ↗(On Diff #69619)

From reading the spec, it looks like bcdrev should match the major/minor fields. Do you think I should bump bcdrev to 0x26 here, or leave it as it is?

rgrimes added inline comments.Fri, Mar 27, 1:25 AM
usr.sbin/bhyve/smbiostbl.c
766 ↗(On Diff #69619)

From digging in svn blame it seems in r272007 Peter Grehan bumped smbios->minor without bumping the BCD value. I would suggest we start another review to fix, and eliminate the possibility of this happening again in the future with:

smbios_ep->bcdrev = ((smbios_ep->major & 0xF) << 4) | smbios_ep->minor & 0xF));

jhb added a comment.Fri, Mar 27, 3:59 PM

I think we should go with 2.7 btw. Systems shouldn't read 2.7 fields if we advertise 2.6 since those fields "shouldn't exist". OTOH, the spec does say systems are permitted to use the length of individual structures to determine which optional fields exist.

jhb added a comment.Fri, Mar 27, 4:04 PM

This calculation seems odd, I see later it is overwritten for large memory, cant we just continue to use extended all the time and leave this set at 0x7fff? Never mind, spec says not. Ok how about doing:

Sorry, I only just saw the more recent comments. The problem is that older operating systems don't know to look at the extended field, and only ever look at the field which has a maximum value of 32GB.

That is what the 0x7fff value is for when using extended btw, is to report the max possible size for legacy systems. I agree that we should use the non-extended size when it fits.

In re-reading the descriptions of types 17 and 19, I think we probably should only do 1 type 17 entry that is lomem + himem. We already DTRT for having separate type 19 entries for lomem and himem. This is similar to an i386 system with 4GB of RAM in one DIMM. You would have two type 19 entries, 1 for memory from 0 to the memio hole (3.5 GB or so) and one for the rest of RAM starting at 4GB, but you would have one type 17 entry for a 4GB DIMM. That would mean removing the whole 'guest_himem' case from type 17's initializer and using conditional logic as Rod suggested to set size or extended size to the total memory size.

In D24107#532391, @jhb wrote:

I think we should go with 2.7 btw. Systems shouldn't read 2.7 fields if we advertise 2.6 since those fields "shouldn't exist". OTOH, the spec does say systems are permitted to use the length of individual structures to determine which optional fields exist.

The seperate review would just correct the code so that bcdrev is derived from major/minor, then this review can do the bump to 2.7?

Rebecca is it ok with you if I go create a review to do the bcdrev fix?

bcran added a comment.Sat, Mar 28, 3:14 AM
In D24107#532391, @jhb wrote:

Rebecca is it ok with you if I go create a review to do the bcdrev fix?

Yes, please do.

bcran added a comment.Sat, Mar 28, 4:19 AM
In D24107#532393, @jhb wrote:

That is what the 0x7fff value is for when using extended btw, is to report the max possible size for legacy systems. I agree that we should use the non-extended size when it fits.

I just checked and @rgrimes is right: when the size field is set to 0x7FFF, dmidecode reports the RAM as "Size: 0 TB" - so we need to use the extended size field for anything equal to or greater than that.
Also, the spec does say this about using the size/extended_size fields:

"For compatibility with older SMBIOS parsers, memory devices smaller than (32 GB - 1 MB) 1399 should be represented using their size in the Size field, leaving the Extended Size field set to 0."

So we _should_ use the non-extended size when it fits.

bcran updated this revision to Diff 69944.EditedSat, Mar 28, 4:42 AM
  • Fixes based on review feedback.

    o Don't create two type 17 entries. o Use the extended size field for 0x7FFF and greater.
bcran marked 7 inline comments as done.Sat, Mar 28, 4:44 AM
bcran updated this revision to Diff 69945.Sat, Mar 28, 4:45 AM
  • Revert bumping the SMBIOS version.
bcran added inline comments.Sat, Mar 28, 4:52 AM
usr.sbin/bhyve/smbiostbl.c
718 ↗(On Diff #69619)

Sorry, do you mean it should check if the guest has less than 32MB, and if so store the size with the high bit set?

rgrimes added inline comments.Sat, Mar 28, 11:20 AM
usr.sbin/bhyve/smbiostbl.c
718 ↗(On Diff #69619)

Well phab has moved this to point at the wrong line, but no what I meant was you do a & 0x7fff or 0x7fffffff on variables that may lead to you truncated a value with no error or warning, which may lead to unexpected results.

Though I believe the &0x7fff is now fixed due to the logic changes, the &0x7ffffff could lead to a truncation and should probably just be a if > 0x7ffffff return error;

bcran updated this revision to Diff 69951.Sat, Mar 28, 4:53 PM
  • Rework type17 generation and error out on overflow.
bcran marked 2 inline comments as done.Sat, Mar 28, 4:53 PM
  • Rework type17 generation and error out on overflow.

There is no need to error out on overflow here. The type17 isn't used by most o/s's for determining how much memory is available: there is e820, the EFI memory map, etc etc. Perhaps print an error in bhyve, buit there doesn't seem any point in erroring out.

There is no need to error out on overflow here. The type17 isn't used by most o/s's for determining how much memory is available: there is e820, the EFI memory map, etc etc. Perhaps print an error in bhyve, buit there doesn't seem any point in erroring out.

Okay. Also, I don't think we're going to have guests configured with ~2PB memory any time in the near future :)

bcran updated this revision to Diff 69964.Sat, Mar 28, 11:33 PM
  • Change overflow error to a warning and truncate type 17 memory size.

There is no need to error out on overflow here. The type17 isn't used by most o/s's for determining how much memory is available: there is e820, the EFI memory map, etc etc. Perhaps print an error in bhyve, buit there doesn't seem any point in erroring out.

Okay. Also, I don't think we're going to have guests configured with ~2PB memory any time in the near future :)

Indeed :) And when 5-level paging h/w is available there will be a lot more changes needed than just htis.

Misconfigurations of guest memory that go past the supported host address lines will be caught earlier than this, and in a review I should be posting shortly.

  • Rework type17 generation and error out on overflow.

There is no need to error out on overflow here. The type17 isn't used by most o/s's for determining how much memory is available: there is e820, the EFI memory map, etc etc. Perhaps print an error in bhyve, buit there doesn't seem any point in erroring out.

I disagree, if infact for some reason that value has gotten >2PB something has gone wrong and what your going to end up with is certainly not what the user wanted.

rgrimes added inline comments.Sun, Mar 29, 1:13 AM
usr.sbin/bhyve/smbiostbl.c
766 ↗(On Diff #69619)

Why are these 2 lines showing up as what look like reverse diffs now?

bcran updated this revision to Diff 69979.Sun, Mar 29, 3:55 PM

Upload the full patch.

bcran marked an inline comment as done.Sun, Mar 29, 3:56 PM
bcran added inline comments.
usr.sbin/bhyve/smbiostbl.c
766 ↗(On Diff #69619)

Oops, I gave the wrong revision to "arc patch" - "HEAD~3" instead of "HEAD~5" so it didn't have the full patch. Should be fixed now, sorry.

jhb accepted this revision.Mon, Mar 30, 7:31 PM
jhb added inline comments.
usr.sbin/bhyve/smbiostbl.c
710 ↗(On Diff #69979)

Please stick to C style comments to be consistent with the rest of the code.

712 ↗(On Diff #69979)

I concur with only a warning and not error'ing here. DMI/SMBIOS tables are generally only used for cosmetics (FreeBSD uses it to be more tidy for dmesg, but all the "real" stuff uses SMAP / EFI memory map).

bcran marked an inline comment as done.Mon, Mar 30, 7:57 PM

@rgrimes Could you approve the latest patch, or provide comments on what should be changed please?

bcran updated this revision to Diff 70019.Mon, Mar 30, 8:00 PM
  • Change comments to C style.
bcran marked 2 inline comments as done.Mon, Mar 30, 8:00 PM
bcran added inline comments.
usr.sbin/bhyve/smbiostbl.c
710 ↗(On Diff #69979)

Good point. Fixed.

bcran edited the test plan for this revision. (Show Details)Mon, Mar 30, 8:05 PM
bcran edited the summary of this revision. (Show Details)

I am happy with the other fixes, not sure if there is a need to fix or error check memory size bit overflows

usr.sbin/bhyve/smbiostbl.c
699 ↗(On Diff #70019)

I just noticed this, its possible for these types to be overflowed as uint64_t/1024 can be 54 bits long and uint64_t/MB can be 44 bits.

rgrimes accepted this revision as: rgrimes.Mon, Mar 30, 10:05 PM
This revision is now accepted and ready to land.Mon, Mar 30, 10:05 PM
bcran marked 2 inline comments as done.Mon, Mar 30, 10:13 PM
bcran added inline comments.
usr.sbin/bhyve/smbiostbl.c
699 ↗(On Diff #70019)

I'll change them back to uint64_t before committing.

This revision was automatically updated to reflect the committed changes.
bcran marked an inline comment as done.