Page MenuHomeFreeBSD

bhyve(8): decouple unix domain socket from save/restore code
Needs ReviewPublic

Authored by rew on Jan 16 2021, 9:43 PM.

Details

Reviewers
jhb
allanjude
Group Reviewers
bhyve
manpages
Summary

The recent save/restore functionality in bhyve(8) also introduced a unix domain
socket that is used by bhyvectl(8).

This patch moves the unix domain socket code into a separate file so that it could
be used without enabling the save/restore functionality. The socket code has been
moved to ipc.c and ipc.h. The code in ipc.c was copied from snapshot.c with
minor changes.

This splits up init_checkpoint_thread() into init_snapshot() and init_ipc().

init_ipc() creates the unix domain socket and a thread to handle requests.
init_snapshot() initializes mutex/cond variables used by the save/restore functionality.

Other changes:

  • expose vm_checkpoint() through snapshot.h
  • socket path is now '/var/run/bhyve/$vmname' instead of '/var/run/bhyve/checkpoint/$vmname'

The socket is still protected behind #ifdef BHYVE_SNAPSHOT guards.

The checkpoint_xxx variables probably deserved to be renamed.

I do have a patch that extends from this review that brings the unix domain socket in
front of the #ifdef BHYVE_SNAPSHOT guards. Also, it hooks up bhyvectl(8) to send
a message (other than suspend/checkpoint) to bhyve(8) over the socket. I mention it
because that's my motivation for splitting up the socket and save/restore code.

Test Plan

checkpoint, suspend, and restore still work on intel

checkpoint and suspend still function on amd - havent had
success using restore (on amd) with or without this patch.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 36280
Build 33169: arc lint + arc unit

Event Timeline

rew requested review of this revision.Jan 16 2021, 9:43 PM
usr.sbin/bhyve/ipc.c
102

Maybe rename this to ipc_thread?

In fact, I would rename several things in here to s/checkpoint/ipc/ except for the checkpoint-specific commands.

139

Perhaps as a followup, I think we want to change this to be a SOCK_DGRAM or SOCK_SEQPACKET socket, and we will want to define a common base structure for messages (with probably just an integer holding the message type) and then message-specific structures that actually get written to the socket. The checkpoint messages would use a message struct that includes the filename. I think checkpoint_op currently includes the vmname, but it isn't used and can be removed.

146

I would like to get rid of this and instead add /var/run/bhyve to BSD.mtree.var instead. You would they just leave off the mkdir entirely and just call bind().

usr.sbin/bhyve/snapshot.c
1443

dont use mkdir() to create /var/run/bhyve

  • create /var/run/bhyve via BSD.mtree.var
  • updated hier(7)

vmmapi.h:

  • rename checkpoint_opcodes to ipc_opcode.
  • add struct ipc_message, this struct is written across the socket.

ipc_message stores an enum and union. The enum is used to track the type of
message being sent, while the union are the different structures that
correspond to the associated message type. Currently, the only structure in the union
is struct checkpoint_op.

ipc.c:

  • rename checkpoint_xxx to ipc_xxx.
  • rename get_checkpoint_msg() to get_message().
  • start using ipc_message within get_message().

bhyvectl/bhyvectl.c:

  • rename send_checkpoint_op_req() to send_message().
  • combine send_start_checkpoint() and send_start_suspend() into snapshot_request().

send_message() writes data over the socket, it now expects three arguments.
The first is a struct vmctx. The other two arguments are a void pointer
(to the data) and the length of the data (in bytes) to be sent.

You mentioned as a follow-up that we'll need a common base structure for messages-specific
structures to be written over the socket. I've included those changes in this review.
If it's easier (or you prefer), I can move that to a separate review.

I haven't changed the socket type from SOCK_STREAM yet, but have made a note
to do so - if you have a preference, let me know.

rew marked 3 inline comments as done.Jan 23 2021, 10:44 PM
usr.sbin/bhyve/ipc.c
139

As the socket is currently used, it seems SOCK_DGRAM would suffice - bhyvectl sends a message; bhyve receives the message and does the thing.

I'm curious, are there scenarios where a bhyve process would want to carry a conversation with another process?

I have a diff (based off this review) that uses SOCK_DGRAM instead: https://reviews.freebsd.org/differential/diff/82812/

bhyve/ipc: use SOCK_DGRAM

  • use SOCK_DGRAM instead of SOCK_STREAM
  • have bhyvectl(8) share ipc.h from bhyve(8)
  • fix naming semantics of MAX_SNAPSHOT_VMNAME, now called MAX_SNAPSHOT_FILENAME.

Previously, snapshot.h and bhyvectl.c each had their own
#define MAX_VMNAME 100, this is now shared at vmmapi.h - the
value has also been changed to 89. It's a limitation for the
unix domain socket path as constrained by SUNPATHLEN (104).
The prefix for the socket path is BHYVE_RUN_DIR, which by default
is '/var/run/bhyve/' (15). 104-15 = 89.

rew marked an inline comment as done.Feb 3 2021, 5:31 PM
bcr added a subscriber: bcr.

OK for the manpage bits.