Page MenuHomeFreeBSD

Fix bug in bhyve by ignoring writes to errata MSRs in AMD cpus
ClosedPublic

Authored by jojo_eljojo.net on Mar 8 2019, 3:18 AM.

Details

Summary

Fix bug on AMD Ryzen CPUs when virtualizing Linux using bhyve. MSR 0xc0011020.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jojo_eljojo.net created this revision.Mar 8 2019, 3:18 AM

Hello, this is my first contribution to FreeBSD. I'm sorry if I'm doing something wrong, this is my first time using the system.

I opened this Pull Request in GitHub.

middlewared was logging the following on my machine, so I thought this could help. Any help/advise is welcome!

[2019/03/03 06:11:12] (DEBUG) VMService.run():287 - ned: wrmsr to register 0xc0011020(0) on vcpu 1
[2019/03/03 06:11:12] (DEBUG) VMService.run():287 - ned: wrmsr to register 0xc0011020(0x400) on vcpu 6
[2019/03/03 06:11:12] (DEBUG) VMService.run():287 - ned: wrmsr to register 0xc0011020(0) on vcpu 7
[2019/03/03 06:11:12] (DEBUG) VMService.run():287 - ned: wrmsr to register 0xc0011020(0) on vcpu 3
[2019/03/03 06:11:12] (DEBUG) VMService.run():287 - ned: wrmsr to register 0xc0011020(0) on vcpu 6

Though this does fix your issue by silencing the log, it does little to improve the situation. I think what we may want to do is to add yet another option to the bhyve command that along with -w to ignore invalid MSR access to do so silently so that we do not have to add code like this everything a new msr comes into play. John, Patrick your thoughts on this approach?

I think that's a good idea. I would prefer if that was done in a different differential.
I would love to take on the challenge, but it could take me quite a while to implement that.

Is there an automated test that I could write for this? How does that work?

I think this fix is good for the short term. I can confirm seeing failed writes to that MSR when running a Linux guest on Zen hardware (which were non-fatal to Linux).

As for tunable wrmsr/rdmsr error logging, perhaps that's something to include in the discussion about a more expressive configuration system for bhyve?

This revision is now accepted and ready to land.Mar 9 2019, 5:17 PM
jojo_eljojo.net accepted this revision.EditedMar 9 2019, 6:19 PM

Thanks for the review. Is it possible for me to commit this myself or should someone else do it on my behalf? I have this code in a git branch on GitHub too: https://github.com/freebsd/freebsd/pull/392

edit: sorry I accepted without realizing. Is one approval enough?

This appears to be used to deal with a core hang on specific CPU's, we need to see that we have mitigated that in the host, and the decide what the appropriate action here is, I am uncomfortable with just making this silently disappear. Deferning to jhb.

José you can not commit the code without a commit access, jhb@ who is tagged in as a reviewer can however act on your pull request.

cem added a subscriber: cem.May 23 2019, 9:38 PM

This is (one of) the AMD internal behavior control MSRs that is used to fix erratum in (at least) family 16h and 17h AMD processors. We write it in FreeBSD, too: https://github.com/freebsd/freebsd/blob/master/sys/amd64/amd64/initcpu.c#L120 . I think it's safe to ignore in hypervisors. @jhb would like to add a defined constant, and use it in identcpu.c too, rather than coding a magic number in another location.

cem accepted this revision.May 23 2019, 9:39 PM

Er, I see you added a defined constant. We should use that in identcpu :-). Anyway, LGTM in principle.

This revision was automatically updated to reflect the committed changes.
jhb added a comment.May 23 2019, 11:39 PM

I've chosen to name the constant MSR_LS_CFG since that seems more consistent with how FreeBSD has chosen constants for AMD MSRs. I've also put MSR_LS_CFG next to MSR_IC_CFG in an earlier section in xmsr.c.

jhb reopened this revision.May 23 2019, 11:40 PM
jhb added a comment.May 23 2019, 11:54 PM

FreeBSD at least expects to be able to read from this MSR as well as write it, though FreeBSD is careful to only do this on bare metal.

At least one place in Linux expects to be able to read this MSR without faulting though (the workaround for erratum 793 on family 16h), so I suspect we should permit this MSR to also be read and just always return 0 as we do for MSR_IC_CFG.

jhb updated this revision to Diff 57812.May 23 2019, 11:55 PM
  • Support read and write.
jhb added a comment.May 23 2019, 11:59 PM

José, would you be able to test the updated version that permits read as well as write and verify it still works for your test case? It would be sufficient to just update your local patch to enable reading, no need to rebase on top of what I've committed so far.

@jhb, reopening a commited phab review messes with things, as now when you make a second commit the default top patch in this review well be the one you commit, and one has to go digging in the review history to find the prior patch that was in the first commit. It is fine to reopen a review for a revert, but I think one should start a new review for a new change, which is what this is.

thanks everyone with your help here :)

I am not sure how to run the patch locally, since I haven't been able to build Freenas within a jail.

I will report this to the freenas maintainers so they bring this in.

jhb added a comment.May 24 2019, 2:52 PM

@jhb, reopening a commited phab review messes with things, as now when you make a second commit the default top patch in this review well be the one you commit, and one has to go digging in the review history to find the prior patch that was in the first commit. It is fine to reopen a review for a revert, but I think one should start a new review for a new change, which is what this is.

Not really. Both commits will end up in the "Commits" list at the top of the review (the cleanup commit is already listed now for example). In this case, the actual bug fix will be the second commit (the first one was a cosmetic cleanup) so it will be the more relevant one to show by default anyway. You can see an example at D20258 though that one phab did choose the first commit as the diff instead of the last commit. I think because I've explicitly uploaded a diff after the first commit, the real diff will still show up after all are committed.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 3 2019, 11:17 PM
This revision was automatically updated to reflect the committed changes.