Page MenuHomeFreeBSD

bhyve - Snapshot Save and Restore
Needs ReviewPublic

Authored by darius.mihaim_gmail.com on Mar 7 2019, 4:56 PM.

Details

Reviewers
None
Group Reviewers
bhyve
Summary

Virtual machine save and restore - currently only supports virtual machine suspend and resume due to a lack of support for disk device snapshots. The implementation is available on
GitHub.

The feature is functional for Intel processors, for certain devices. The test matrix is available on the project's wiki.

The project has not been tested on AMD processors, but Linux guests seem to freeze at least in some cases (e.g., the virtual machine is restored after the host is rebooted), so support is limited.

Known issues:

  • only one device of a type should be used
  • FreeBSD and Linux guests may be frozen for a few seconds or report disk errors with AHCI block devices (data does not seem to be lost)
  • e1000 on Linux will sometimes not work after a restore
Test Plan

For all guests testing has been done as (specific devices may differ, as noted below):

  • constantly ping a remote server
  • start copying a large file
  • suspend the virtual machine
  • [optionally] reboot the host
  • restore the virtual machine
  • all functionality should continue as expected

FreeBSD guest

  • use virtio-net / e1000, ahci and frame buffer with xhci

Linux guest

  • use virtio-net, ahci and frame buffer with xhci

Windows guest

  • use virtio-net / e1000, ahci and frame buffer with xhci

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

pi added a subscriber: pi.Mar 7 2019, 5:16 PM
emaste added a subscriber: emaste.Mar 7 2019, 5:23 PM
cem added a subscriber: cem.Mar 7 2019, 6:25 PM

I have no idea on the internals, but some quick comments as I tried to read the code as-is.

sys/amd64/include/vmm.h
402

This change seems to have been erroneously made.

413

First line of a comment should be /*. See style(9)

sys/amd64/vmm/amd/vmcb.c
468

If error is already set, why are we running vm_get_register? We are overwriting error with the return value from vm_get_register at that point.

484

This has the same issue as the code above.

sys/amd64/vmm/vmm.c
824

Single statement if/else does not require braces.

dab added a subscriber: dab.Mar 8 2019, 4:57 AM
dab added inline comments.
lib/libvmmapi/vmmapi.c
111–114

This code could be somewhat simplified to:

asprintf(&vm_checkpoint_file, "/dev/vmm/%s_mem", name);
assert(vm_checkpoint_file != NULL);
129–130

Is failure to open the checkpoint device not an overall failure of this function? In other words, should there be a goto err here?

Also, I am not sure why failure to open the checkpoint device merits a printf() but failure to open the vm device does not. Should this just silently fail (goto err) as in the above case?

241–245

Why not just return (ioctl(...)) and skip the error variable?

277

Why #if 1?

478–480

It would seem useful to return an error if max_len is too small to hold the name. Also, I would use strlcpy() in preference to snprintf(). Lastly, I would declare max_len to be size_t.

lib/libvmmapi/vmmapi.h
131

Why not declare max_len as size_t?

sys/amd64/include/vmm_snapshot.h
38–40

Perhaps there should be a function to determine the snapshot length rather than leaving it up to someone to interpret this comment and reach in to the struct components to calculate it.

42

It looks like this line could be deleted.

85

s/in stead/instead/

sys/amd64/vmm/intel/vmcs.c
481–482

There is no need for this if and goto; the code will fall through to done naturally.

485

There is no need for this goto; the code will fall through to done naturally.

513–517

See comments in function above.

541–545

See comments in function above.

araujo added a subscriber: araujo.Mar 8 2019, 7:42 AM
lib/libvmmapi/vmmapi.c
241–245

I'm not particularly fond of return (func(params));. Would it be more inline with the coding style rewriting it like this?

sys/amd64/include/vmm_snapshot.h
38–40

There is a function called vm_get_snapshot_size - not that you mention it, I realize that the information in the comment should have mentioned "snapshot size" - that computes the size (or length) of the snapshot data

42

It was left there to point that there wasn't a need for an explicit field. I may remove this line and the comment above if people agree that they shouldn't really be there.

sys/amd64/vmm/intel/vmcs.c
481–482

The decision was made to keep things symmetrical on all branches of the if (similar to breaks for all cases of a switch-case), and also as a precaution in case more code is added after the if.

485

Same as the previous case.

513–517

Same observations.

541–545

Same observations.

darius.mihaim_gmail.com edited the test plan for this revision. (Show Details)Mar 8 2019, 10:20 AM
dab added inline comments.Mar 8 2019, 1:17 PM
lib/libvmmapi/vmmapi.c
241–245

I saw other places in this code where return (ioctl(...)) was done, so it wouldn't be out of character with the rest of the code. That being said, I don't think style(9) makes any mention of this one way or the other. I like the simplified code, but it is totally up to you.

sys/amd64/include/vmm_snapshot.h
38–40

Ah, missed that. Thanks.

42

Maybe just mention vm_get_snapshot_size() in the comment.

sys/amd64/vmm/intel/vmcs.c
481–482

I prefer the simpler (and less verbose) code, but can see your point.

emaste added inline comments.Mar 8 2019, 2:33 PM
lib/libvmmapi/vmmapi.c
129–130

Agreed the printf is probably unnecessary, but if we want to keep it, should probably be warnx instead.

241–245

For something very simple like this I would go with return(function(...)), but the current one is fine. However, style(9) would omit the extra blank line before the return.

266

style(9) wants

/*
 * Multiline comments
 * to look like this.
 */
1598

style discourages // comments

lib/libvmmapi/vmmapi.h
131

We should prefer size_t (and bool) where appropriate in new code - of course this is relatively rare in FreeBSD today because much of the code predates these types.

sys/amd64/include/vmm.h
702

Some comments in this change /* Start with a capital letter and end with a period. */ and /* some are all lowercase without a period */. Style(9) wants the former.

cem added inline comments.Mar 8 2019, 3:56 PM
lib/libvmmapi/vmmapi.c
241–245

style(9) mentions this explicitly: "Values in return statements should be enclosed in parentheses" (as well as by example).

emaste added inline comments.Mar 8 2019, 5:15 PM
lib/libvmmapi/vmmapi.c
241–245

It's not a question of parens vs not, it's

err = foo()

return (err);

vs

return(foo());
novel added a subscriber: novel.Mar 9 2019, 5:51 AM

Fix issues pointed by inline comments

darius.mihaim_gmail.com marked 16 inline comments as done.Mar 10 2019, 10:52 AM
darius.mihaim_gmail.com added inline comments.
lib/libvmmapi/vmmapi.c
129–130

Removed the print for now and replaced it with a goto err as suggested. May be useful to have the VM simply not support suspend/resume, instead of failing to create it when the device is not created.

241–245

Went with return (ioctl(...)); in the end, since the function is simple enough.

277

Was forgotten there as a sanity check. I have removed it.

darius.mihaim_gmail.com marked an inline comment as done.
  • remove guest memory mapping device that is currently unused. The device was used to make snapshots faster by creating a separate Copy-on-Write mapping of the guest memory. This approach is not useful for guest suspends, and we may be able to avoid exporting it through a device even for snapshots (will be investigating that approach).
  • remove some unused code

Remove some unused code and fix style of comments

novel added a comment.Apr 8 2019, 5:41 PM

This is a very useful feature, thanks for working on that!

However, it seems it could be more user-friendly.

Few observations:

  1. How do I know status of save/restore? If I do bhyvectl --vm=... --suspend=..., it seems to just exit with 0. Is there any way to figure out if the suspend attempt was successful or monitor progress? Maybe it would be good to have blocking mode for this operation (i.e. bhyvectl doesn't exit until it's finished).
  2. Same for --checkpoint.
  3. Is there a way to list all available snapshots (with creation time)?
  4. It looks like it's terminating the bhyve process even if the running VM configuration doesn't support snapshotting, which is pretty unfortunate (also, it would be good if snapshotting errors could be displayed by bhyvectl, not bhyve stdout).
usr.sbin/bhyve/pci_emul.c
2039

I get these messages for devices that I don't actually have in a VM, is that expected behavior? Also, why is that using dos-style (\r\n) line endings?

This is a very useful feature, thanks for working on that!

Thank you very much for testing it!

However, it seems it could be more user-friendly.

I agree, but we have been focusing on functionality for the most part.

Few observations:

  1. How do I know status of save/restore? If I do bhyvectl --vm=... --suspend=..., it seems to just exit with 0. Is there any way to figure out if the suspend attempt was successful or monitor progress? Maybe it would be good to have blocking mode for this operation (i.e. bhyvectl doesn't exit until it's finished).

Could be done, but for now bhyvectl simply writes to the bhyve's checkpoint UNIX socket, and no data is sent back. Simply reading from the socket in bhvyectl to make it wait for a status message from bhyve may suffice. For progress, an extension of this where bhyve sends the current progress is also possible. A very large part of the time (almost all of it, for VMs that have more than the minimal amount of RAM required by the OS) is spent to save memory, so the progress may be saved_memory / total_memory.

  1. Same for --checkpoint.

Checkpoints are currently redirected to suspends, due to a lack of support for snapshots for virtual disks. After disk support is added, proper checkpoints will also be added, but they should be created immediately. That means that bhyvectl should only get/show the fail/success status without blocking (too long).

  1. Is there a way to list all available snapshots (with creation time)?

Currently none. Closest you can get to that is listing the creation time of the snapshot files.

  1. It looks like it's terminating the bhyve process even if the running VM configuration doesn't support snapshotting, which is pretty unfortunate (also, it would be good if snapshotting errors could be displayed by bhyvectl, not bhyve stdout).

I'm not entirely sure how/why it is killed. Please add the output from bhyve, or feel free to E-mail me.

usr.sbin/bhyve/pci_emul.c
2039

It's because the devices are in a static array in snapshot.c - that's also why only one device of each type is allowed. When we'll add support for dynamically registering devices, these messages shouldn't appear anymore.

The \r\n line endings are put there because the output from bhyve was going to the next line (new line was inserted), but continued from where the last line ended (no carriage return). We didn't find why this happens, and went with the \r\n newlines instead. No real reason to keep them if the output behaves, though.

Add minor fixes to snapshot logic:

  • correct variable used as parameter for 'sizeof' in vatpic.c (the size was correct before, by coincidence)
  • remove a comment that became irrelevant in the past in vhpet.c
  • fix vLAPIC timer reset logic in the off-chance the timer would be periodic and a value of 0 is saved for the counter
krion added a subscriber: krion.Wed, May 22, 8:37 PM