Page MenuHomeFreeBSD

bhyve can miss PIR wake-ups
ClosedPublic

Authored by pmooney_pfmooney.com on Feb 22 2019, 10:33 PM.

Details

Summary

While running workloads under bhyve on SmartOS, we found that under certain conditions, posted interrupts could fail to properly wake a vCPU from a HLTed state. Linux guests were seemingly unaffected, but illumos guests easily triggered the behavior. The initial analysis is detailed in OS-6829. That fix was later amended in OS-6930. Finally, after feedback from tychon about where the cached priority was being stored, the change for OS-7354 was put in place.

This diff represents the accumulation of all three efforts, distilled into one fix for the issue.

Test Plan

The tickets noted above feature notes about how this was tested in SmartOS bhyve. With APICv-capable hardware, similar conditions could be replicated with a FreeBSD host.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

In the future it helps to upload full file diffs, not sure what vcs your using or what its command is, for svn I use svn diff -x U999999.

sys/amd64/vmm/intel/vmx.c
5 ↗(On Diff #54234)

Move this line after the all rights reserved line please, as that is part of NetApp copyright.
If Joyent wants to assert all rights, they should do so in there copyright, preferably:

  • Copyright (c) 2018 Joyent, Inc. All rights reserved.
3469 ↗(On Diff #54234)

FreeBSD style(9) pretty much says to not declare variables in inner blocks, these should be moved and sorted into the variables at the beginning of the function. The assignments can stay here, just the declare gets moved.

I used git format-patch assuming phabricator would pick up on the context stuff, but apparently not. I'll be sure to increase the context for the next revision.

sys/amd64/vmm/intel/vmx.c
5 ↗(On Diff #54234)

Will fix

3469 ↗(On Diff #54234)

That would mean discarding the 'const'. The point was to make the test and subsequent atomic assignment below more readable, without implying that the variables have any more than one state. style(9) doesn't expressly forbid it, and there are a few examples of scoped declarations elsewhere in the kernel, but I can make that change if it's an important issue.

imp added inline comments.Feb 23 2019, 6:31 AM
sys/amd64/vmm/intel/vmx.c
3469 ↗(On Diff #54234)

style(9) says no such thing. It used to, but that rule has been relaxed. Here it clearly makes sense to use.

sys/amd64/vmm/intel/vmx.c
5 ↗(On Diff #54234)

BTW: I used other files in bhyve as a guide, where they tended to list the various copyright lines before the "All rights reserved."
One example: https://reviews.freebsd.org/source/src/browse/head/usr.sbin/bhyve/iov.c$4

Updated diff to include full context

rgrimes added inline comments.Feb 27 2019, 8:42 PM
sys/amd64/vmm/intel/vmx.c
5 ↗(On Diff #54234)

There are plenty of bad examples in the tree, it is wrong to insert your copyright between another copyright and the words All rights reserved.

3469 ↗(On Diff #54234)

That is jhb@'s call

Fixed copyright attribution per Rod's comment.

rgrimes accepted this revision as: rgrimes.Feb 27 2019, 9:01 PM
This revision is now accepted and ready to land.Feb 27 2019, 9:01 PM
jhb added a comment.Feb 27 2019, 9:08 PM

Hmm, I guess it would be expensive to clear the bits from pending_prio when interrupts are actually delivered? And/or we can't tell when that is when PIR is in use? The only thing that slightly worries me is if the last hunk throws away too much data by clearing all the bits except for prio_bit rather than only clearing higher bits. I guess even if there are still lower priority bits pending, we don't need to worry about sending a notification. When the vCPU lowers it's TPR that will then notice and handle those interrupts without 'pending_prio' being involved.

We do, in a sense, clear those bits on interrupt delivering since it involves clearing the pending bit, meaning the next 0 -> 1 transition would incur a clean of pending_prio. Clearing the lower bits in pending_prio as part of vmx_pending_intr carries no risk of incurring extra work, since interrupt notification is triggered from having an incoming prio greater than what is cached in pending_prio. Besides, spurious wake-ups from the HLT loop to check for interrupts are much preferred to missing one.

jhb accepted this revision.Feb 28 2019, 5:50 PM

I recall seeing the earlier version of this before and approving it only to then have Tycho point out an issue I had failed to notice. I spent some time yesterday searching around in my MUA but couldn't find the thread. :-/ I think the issue you fixed from the bug you noted about not using the reserved PIR bits for the bitmask was the thing he had mentioned, I just wish I could confirm that for sure.

sys/amd64/vmm/intel/vmx.c
3549 ↗(On Diff #54501)

I'll add a blank line before this when committing, but I can do that as part of the commit, just FYI for future reference.

Yes, he was concerned about my use of the reserved bits. For a VMX-only case, it was probably fine, but if we ever want PCI-passthru to post interrupts directly to the guest, we'll need those bits. The write-up for OS-7354, which switched to the separate pending_prio field, covers the reasoning.

jhb added inline comments.Feb 28 2019, 10:36 PM
sys/amd64/vmm/intel/vmx.c
3468 ↗(On Diff #54501)

FYI, FreeBSD uses 'u_int' and doesn't have a 'uint_t'. I've fixed this during testing though and will just include it that way in the commit.

tychon accepted this revision.Mar 1 2019, 1:40 AM
tychon added a subscriber: tychon.

This looks great and addresses my concern from an earlier iteration. Thanks for fixing it!

This revision was automatically updated to reflect the committed changes.