Fix bug on AMD Ryzen CPUs when virtualizing Linux using bhyve. MSR 0xc0011020.
Hello, this is my first contribution to FreeBSD. I'm sorry if I'm doing something wrong, this is my first time using the system.
I opened this Pull Request in GitHub.
middlewared was logging the following on my machine, so I thought this could help. Any help/advise is welcome!
[2019/03/03 06:11:12] (DEBUG) VMService.run():287 - ned: wrmsr to register 0xc0011020(0) on vcpu 1 [2019/03/03 06:11:12] (DEBUG) VMService.run():287 - ned: wrmsr to register 0xc0011020(0x400) on vcpu 6 [2019/03/03 06:11:12] (DEBUG) VMService.run():287 - ned: wrmsr to register 0xc0011020(0) on vcpu 7 [2019/03/03 06:11:12] (DEBUG) VMService.run():287 - ned: wrmsr to register 0xc0011020(0) on vcpu 3 [2019/03/03 06:11:12] (DEBUG) VMService.run():287 - ned: wrmsr to register 0xc0011020(0) on vcpu 6
Though this does fix your issue by silencing the log, it does little to improve the situation. I think what we may want to do is to add yet another option to the bhyve command that along with -w to ignore invalid MSR access to do so silently so that we do not have to add code like this everything a new msr comes into play. John, Patrick your thoughts on this approach?
I think that's a good idea. I would prefer if that was done in a different differential.
I would love to take on the challenge, but it could take me quite a while to implement that.
Is there an automated test that I could write for this? How does that work?
I think this fix is good for the short term. I can confirm seeing failed writes to that MSR when running a Linux guest on Zen hardware (which were non-fatal to Linux).
As for tunable wrmsr/rdmsr error logging, perhaps that's something to include in the discussion about a more expressive configuration system for bhyve?
This appears to be used to deal with a core hang on specific CPU's, we need to see that we have mitigated that in the host, and the decide what the appropriate action here is, I am uncomfortable with just making this silently disappear. Deferning to jhb.
José you can not commit the code without a commit access, jhb@ who is tagged in as a reviewer can however act on your pull request.
This is (one of) the AMD internal behavior control MSRs that is used to fix erratum in (at least) family 16h and 17h AMD processors. We write it in FreeBSD, too: https://github.com/freebsd/freebsd/blob/master/sys/amd64/amd64/initcpu.c#L120 . I think it's safe to ignore in hypervisors. @jhb would like to add a defined constant, and use it in identcpu.c too, rather than coding a magic number in another location.
I've chosen to name the constant MSR_LS_CFG since that seems more consistent with how FreeBSD has chosen constants for AMD MSRs. I've also put MSR_LS_CFG next to MSR_IC_CFG in an earlier section in xmsr.c.
FreeBSD at least expects to be able to read from this MSR as well as write it, though FreeBSD is careful to only do this on bare metal.
At least one place in Linux expects to be able to read this MSR without faulting though (the workaround for erratum 793 on family 16h), so I suspect we should permit this MSR to also be read and just always return 0 as we do for MSR_IC_CFG.
José, would you be able to test the updated version that permits read as well as write and verify it still works for your test case? It would be sufficient to just update your local patch to enable reading, no need to rebase on top of what I've committed so far.
@jhb, reopening a commited phab review messes with things, as now when you make a second commit the default top patch in this review well be the one you commit, and one has to go digging in the review history to find the prior patch that was in the first commit. It is fine to reopen a review for a revert, but I think one should start a new review for a new change, which is what this is.
thanks everyone with your help here :)
I am not sure how to run the patch locally, since I haven't been able to build Freenas within a jail.
I will report this to the freenas maintainers so they bring this in.
Not really. Both commits will end up in the "Commits" list at the top of the review (the cleanup commit is already listed now for example). In this case, the actual bug fix will be the second commit (the first one was a cosmetic cleanup) so it will be the more relevant one to show by default anyway. You can see an example at D20258 though that one phab did choose the first commit as the diff instead of the last commit. I think because I've explicitly uploaded a diff after the first commit, the real diff will still show up after all are committed.