Page MenuHomeFreeBSD

bhyve(8): decouple unix domain socket from save/restore code
Needs ReviewPublic

Authored by rew on Jan 16 2021, 9:43 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Apr 1, 4:05 AM
Unknown Object (File)
Fri, Mar 29, 9:31 AM
Unknown Object (File)
Jan 29 2024, 2:19 PM
Unknown Object (File)
Jan 17 2024, 11:05 AM
Unknown Object (File)
Dec 29 2023, 5:12 AM
Unknown Object (File)
Dec 20 2023, 7:14 AM
Unknown Object (File)
Dec 19 2023, 7:10 AM
Unknown Object (File)
Dec 18 2023, 1:32 PM

Details

Reviewers
jhb
allanjude
Group Reviewers
bhyve
manpages
Summary

The recent save/restore functionality in bhyve(8) also introduced a unix domain
socket that is used by bhyvectl(8).

This patch moves the unix domain socket code into a separate file so that it could
be used without enabling the save/restore functionality. The socket code has been
moved to ipc.c and ipc.h. The code in ipc.c was copied from snapshot.c with
minor changes.

This splits up init_checkpoint_thread() into init_snapshot() and init_ipc().

init_ipc() creates the unix domain socket and a thread to handle requests.
init_snapshot() initializes mutex/cond variables used by the save/restore functionality.

Other changes:

  • expose vm_checkpoint() through snapshot.h
  • socket path is now '/var/run/bhyve/$vmname' instead of '/var/run/bhyve/checkpoint/$vmname'

The socket is still protected behind #ifdef BHYVE_SNAPSHOT guards.

The checkpoint_xxx variables probably deserved to be renamed.

I do have a patch that extends from this review that brings the unix domain socket in
front of the #ifdef BHYVE_SNAPSHOT guards. Also, it hooks up bhyvectl(8) to send
a message (other than suspend/checkpoint) to bhyve(8) over the socket. I mention it
because that's my motivation for splitting up the socket and save/restore code.

Test Plan

checkpoint, suspend, and restore still work on intel

checkpoint and suspend still function on amd - havent had
success using restore (on amd) with or without this patch.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 41082
Build 37971: arc lint + arc unit

Event Timeline

rew requested review of this revision.Jan 16 2021, 9:43 PM
usr.sbin/bhyve/ipc.c
103

Maybe rename this to ipc_thread?

In fact, I would rename several things in here to s/checkpoint/ipc/ except for the checkpoint-specific commands.

140

Perhaps as a followup, I think we want to change this to be a SOCK_DGRAM or SOCK_SEQPACKET socket, and we will want to define a common base structure for messages (with probably just an integer holding the message type) and then message-specific structures that actually get written to the socket. The checkpoint messages would use a message struct that includes the filename. I think checkpoint_op currently includes the vmname, but it isn't used and can be removed.

147

I would like to get rid of this and instead add /var/run/bhyve to BSD.mtree.var instead. You would they just leave off the mkdir entirely and just call bind().

usr.sbin/bhyve/snapshot.c
1441–1442

dont use mkdir() to create /var/run/bhyve

  • create /var/run/bhyve via BSD.mtree.var
  • updated hier(7)

vmmapi.h:

  • rename checkpoint_opcodes to ipc_opcode.
  • add struct ipc_message, this struct is written across the socket.

ipc_message stores an enum and union. The enum is used to track the type of
message being sent, while the union are the different structures that
correspond to the associated message type. Currently, the only structure in the union
is struct checkpoint_op.

ipc.c:

  • rename checkpoint_xxx to ipc_xxx.
  • rename get_checkpoint_msg() to get_message().
  • start using ipc_message within get_message().

bhyvectl/bhyvectl.c:

  • rename send_checkpoint_op_req() to send_message().
  • combine send_start_checkpoint() and send_start_suspend() into snapshot_request().

send_message() writes data over the socket, it now expects three arguments.
The first is a struct vmctx. The other two arguments are a void pointer
(to the data) and the length of the data (in bytes) to be sent.

You mentioned as a follow-up that we'll need a common base structure for messages-specific
structures to be written over the socket. I've included those changes in this review.
If it's easier (or you prefer), I can move that to a separate review.

I haven't changed the socket type from SOCK_STREAM yet, but have made a note
to do so - if you have a preference, let me know.

rew marked 3 inline comments as done.Jan 23 2021, 10:44 PM
usr.sbin/bhyve/ipc.c
140

As the socket is currently used, it seems SOCK_DGRAM would suffice - bhyvectl sends a message; bhyve receives the message and does the thing.

I'm curious, are there scenarios where a bhyve process would want to carry a conversation with another process?

I have a diff (based off this review) that uses SOCK_DGRAM instead: https://reviews.freebsd.org/differential/diff/82812/

bhyve/ipc: use SOCK_DGRAM

  • use SOCK_DGRAM instead of SOCK_STREAM
  • have bhyvectl(8) share ipc.h from bhyve(8)
  • fix naming semantics of MAX_SNAPSHOT_VMNAME, now called MAX_SNAPSHOT_FILENAME.

Previously, snapshot.h and bhyvectl.c each had their own
#define MAX_VMNAME 100, this is now shared at vmmapi.h - the
value has also been changed to 89. It's a limitation for the
unix domain socket path as constrained by SUNPATHLEN (104).
The prefix for the socket path is BHYVE_RUN_DIR, which by default
is '/var/run/bhyve/' (15). 104-15 = 89.

rew marked an inline comment as done.Feb 3 2021, 5:31 PM
bcr added a subscriber: bcr.

OK for the manpage bits.

Sorry if you were waiting for me. Do you have anything else you are waiting on to get this merged in?

usr.sbin/bhyve/ipc.h
46

Might be nice to leave 0 undefined.

61

Could maybe make the union anonymous and instead use more specific names like 'checkpoint' for the checkpoint_op?

usr.sbin/bhyve/ipc.h
61

One idea we had had was replacing this struct with an nvlist, so we face far fewer ABI issues when we want to extend the functionality of the control socket.

We maybe also want to include a version number, just for future proofing.

usr.sbin/bhyve/ipc.h
61

One idea we had had was replacing this struct with an nvlist, so we face far fewer ABI issues when we want to extend the functionality of the control socket.

We maybe also want to include a version number, just for future proofing.

My only concern is if we make this too complex it gets a bit hard to use. However, perhaps just using nvlist_send and nvlist_recv over the socket would be ok. Not sure what socket type nvlist expects to use? (SOCK_STREAM vs SOCK_DGRAM vs SOCK_SEQPACKET)

rew marked an inline comment as done and an inline comment as not done.Aug 16 2021, 9:37 PM
In D28199#708379, @jhb wrote:

Sorry if you were waiting for me. Do you have anything else you are waiting on to get this merged in?

No worries, I shouldn't have created such a large review from the get go. As heads up, some of the proposed changes from the original review have already been committed.

usr.sbin/bhyve/ipc.h
61

It looks like nvlist_send() uses send() under the covers; so if I understand correctly, it's expecting either SOCK_STREAM or SOCK_SEQPACKET.

If we want to use an nvlist; then I could look into doing one of the following:

  1. go back to SOCK_STREAM or SOCK_SEQPACKET
  2. add the functions nvlist_sendto() and nvlist_recvfrom() to libnv
  3. something else...?

My opinon on nvlist vs struct:

I could go either way. On one hand it's nice to have a simple/rudimentary interface (plain struct) to see where/if/how the control socket will evolve. One question that comes to mind..what if we want to send a plain struct over the socket? How will an nvlist handle that?

At the moment, the only other functionality I'm using the control socket for (using a local patch) is to trigger an ACPI poweroff from bhyvectl.

usr.sbin/bhyve/ipc.h
61

I've made the changes to allow nvlist_send()/nvlist_recv() to work with SOCK_DGRAM.

I'll get a review up shortly for bhyve to use an nvlist instead.

I don't see any manual page change in here. Am I missing something?

In D28199#636696, @rew wrote:

bhyve/ipc: use SOCK_DGRAM

  • use SOCK_DGRAM instead of SOCK_STREAM
  • have bhyvectl(8) share ipc.h from bhyve(8)
  • fix naming semantics of MAX_SNAPSHOT_VMNAME, now called MAX_SNAPSHOT_FILENAME.

Previously, snapshot.h and bhyvectl.c each had their own
#define MAX_VMNAME 100, this is now shared at vmmapi.h - the
value has also been changed to 89. It's a limitation for the
unix domain socket path as constrained by SUNPATHLEN (104).
The prefix for the socket path is BHYVE_RUN_DIR, which by default
is '/var/run/bhyve/' (15). 104-15 = 89.

The unix socket path limit could be worked around with connectat(2) and bindat(2) which would also work better with Capsicum.