Page MenuHomeFreeBSD

usr.sbin/bhyve: Implement a generic block store backend interf.
Needs ReviewPublic

Authored by wjw_digiware.nl on Jan 2 2020, 5:11 PM.

Details

Reviewers
mav
ryan_freqlabs.com
Group Reviewers
bhyve
manpages
Summary

To be able to plugin different storage systems an abstraction
layer for the block storage is "required".

This is currently implemented in the existing block_if.{hc}
files.
The current callee access routings have their prefixes renamed
from blockif_ to blocklocal_.
And have moved into a separate pair of files block_local.{hc}
Functional there should not be an change, other that the
calls to the storage API now go thru a an extra set of tabled
functions.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 29389
Build 27280: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Any chance you have a second to review this mav?

ryan_freqlabs.com requested changes to this revision.Jan 18 2020, 7:46 PM

I have not inspected the code in depth, but there are a number of style violations that jump out at a glance. Fixing these various nits will make the changes easier to read for reviewers. Rather than clutter up the diff I'll just mention the key points:

  • Several files appear to have been edited with 4-space indentation settings rather than hard tabs, in contrast with the surrounding code. The different whitespace is confusing/frustrating.
  • A bunch of added functions in block_if.c should have a newline after the function return type, rather than a bunch of horizontal space.

See style(9) as a more detailed guide, but in general matching the style of surrounding code is most desirable.

One other point I would like to see addressed is some form of documentation describing what is necessary in order to implement additional storage backends. Maybe the local block backend code can be moved to its own file to serve as a clean example.

This revision now requires changes to proceed.Jan 18 2020, 7:46 PM

I have not inspected the code in depth, but there are a number of style violations that jump out at a glance. Fixing these various nits will make the changes easier to read for reviewers. Rather than clutter up the diff I'll just mention the key points:

  • Several files appear to have been edited with 4-space indentation settings rather than hard tabs, in contrast with the surrounding code. The different whitespace is confusing/frustrating.
  • A bunch of added functions in block_if.c should have a newline after the function return type, rather than a bunch of horizontal space.

See style(9) as a more detailed guide, but in general matching the style of surrounding code is most desirable.

I'll fix up these style issues.

One other point I would like to see addressed is some form of documentation describing what is necessary in order to implement additional storage backends. Maybe the local block backend code can be moved to its own file to serve as a clean example.

I've taken the parallel of the networking multiplexer abstraction, and there the "standard" network code is also in net_backends.{hc}.
And thusfar I was not able to find any documentation of sorts other than comments in the code files.
But if refactoring the regular block_if into seperate files, that is an easy start.

araujo added a subscriber: araujo.Jan 20 2020, 1:42 AM

Have you ever take a look at: https://github.com/xcllnt/libvdsk
It seemed to me a better idea than what are you trying to achieve here, and there, there is an experimental implementation of qcow2.

I'm much more in favor of libvdsk than this implementation proposed here.

Have you ever take a look at: https://github.com/xcllnt/libvdsk
It seemed to me a better idea than what are you trying to achieve here, and there, there is an experimental implementation of qcow2.
I'm much more in favor of libvdsk than this implementation proposed here.

To be honest, no I had not seen this before.
And to be honest what I 'm trying to implement, although for a different reason, is more or less available there.
I guess if this was commited I would probably have tried to fit it in this library.

Having that said, my code is a much more simple aproach in just trying to abstract the block-interface.
No more no less. Some parts of the code look relatively equal, some parts of the code look quite a bit more
complex. Not sure if that is due to the desire to also be able to interpret VM disk formats.
Being able to interpret and use all the different virtual disk formats is a task that lies ont top of that.
And as such I would think that that would be implemented on top of the generic block backend.

Question is why this did not get committed?
Last updates to the code are 4 months old, and I guess that merging as it is, will not work.
Has Marcel lost interest in continuing with this?

Getting the block_backend multiplexer in, will probably change the libvdisk code, in that it'll
one of the block citizens under the backend multiplexer. And make that part of the code
superfluous.

My need is simple, I need to be able to write a different backend, and this is for me the smallest
impact on the current code to get that done.
So just for that reason I would prefer that.

Have you ever take a look at: https://github.com/xcllnt/libvdsk
It seemed to me a better idea than what are you trying to achieve here, and there, there is an experimental implementation of qcow2.
I'm much more in favor of libvdsk than this implementation proposed here.

To be honest, no I had not seen this before.
And to be honest what I 'm trying to implement, although for a different reason, is more or less available there.
I guess if this was commited I would probably have tried to fit it in this library.
Having that said, my code is a much more simple aproach in just trying to abstract the block-interface.
No more no less. Some parts of the code look relatively equal, some parts of the code look quite a bit more
complex. Not sure if that is due to the desire to also be able to interpret VM disk formats.
Being able to interpret and use all the different virtual disk formats is a task that lies ont top of that.
And as such I would think that that would be implemented on top of the generic block backend.
Question is why this did not get committed?
Last updates to the code are 4 months old, and I guess that merging as it is, will not work.
Has Marcel lost interest in continuing with this?

+ @marcel

Having that said, my code is a much more simple aproach in just trying to abstract the block-interface.

For the most part the abstraction shows up as a search-and-replace change, which has its advantages for sure. For me, the immediate question is: what exactly is being abstracted that needed abstracting?
I'm not saying this is a bad change, but I'm saying that comparing it to what libvdsk aims to achieve is doing a huge disfavor to libvdsk :-)

Being able to interpret and use all the different virtual disk formats is a task that lies ont top of that.
And as such I would think that that would be implemented on top of the generic block backend.

I concur. This change does not seem to conflict with libvdsk per se.
hat said, in its current form this change also doesn't help. The key problem with bhyve right now is that block devices are explicitly opened, but they are never explicitly closed. It's important, from an abstraction and interfacing perspective, that every open sees a close so that backends (be it via libvdsk or not) can keep state. This is especially important when libvdsk allows writing to virtual disk formats like qcow2 or vmdk.

Question is why this did not get committed?
Last updates to the code are 4 months old, and I guess that merging as it is, will not work.
Has Marcel lost interest in continuing with this?

Marcel has mostly lost time :-)

There are a few reasons why this didn't get committed a few months ago:

  1. libvdsk needs a good set of manpages that explains the interface in detail.
  2. To that end, some of the internals still had to be thought through. Things like multi-threading support is something you want to be clear on up front.
  3. The implementation needs to be able to help debug disk related issue in bhyve, because in bhyve the problems are more likely to show up as random crashes. Tracing support has been added, but I

m not sure it's done enough to say that we have a solid foundation on which to build and improve.

  1. Why qcow2 support is there in the early stages, I'm not comfortable exposing write support without a solid unit test suite. We can't really depend on bhyve for testing.
  2. I was hoping to create a common framework for supporting virtual disk formats, because they all have some kind of indexing to map from logical block address to physical offset. This really goes a long way in getting support for all of the virtual disk formats that mkimg supports and hopefully with comparable quality. I expect such an approach avoids a lot of duplication.

This obviously is setting a high bar and that's why it's not committed. The flip side is that it may never be committed because it's up to one or two people to get the implementation there before the communicate can help out. As such, it may not be the best approach.

Getting the block_backend multiplexer in, will probably change the libvdisk code, in that it'll
one of the block citizens under the backend multiplexer. And make that part of the code
superfluous.

libvdsk is intended to be "the" backend. Having it be one backend means that there's code duplication between libvdsk and bhyve and that's probably something we should avoid.

My need is simple, I need to be able to write a different backend, and this is for me the smallest
impact on the current code to get that done.
So just for that reason I would prefer that.

Can you elaborate on your use case. What kind of backend is it? I think it'll help reviewers understand this change if they understand the need for it.

Having that said, my code is a much more simple aproach in just trying to abstract the block-interface.

For the most part the abstraction shows up as a search-and-replace change, which has its advantages for sure. For me, the immediate question is: what exactly is being abstracted that needed abstracting?
I'm not saying this is a bad change, but I'm saying that comparing it to what libvdsk aims to achieve is doing a huge disfavor to libvdsk :-)

I'm comparing your work to mine, because somebody asked me to do just that.
And I have to concur with your conclussion: It is just a tad more than a good search and replace.
Which was also more or less what I intended to write above.

Being able to interpret and use all the different virtual disk formats is a task that lies ont top of that.
And as such I would think that that would be implemented on top of the generic block backend.

I concur. This change does not seem to conflict with libvdsk per se.
hat said, in its current form this change also doesn't help. The key problem with bhyve right now is that block devices are explicitly opened, but they are never explicitly closed. It's important, from an abstraction and interfacing perspective, that every open sees a close so that backends (be it via libvdsk or not) can keep state. This is especially important when libvdsk allows writing to virtual disk formats like qcow2 or vmdk.

What I'm trying to do is perhaps best illustrated by the main comment in:
https://github.com/qemu/qemu/blob/master/block/rbd.c

/*
 * When specifying the image filename use:
 *
 * rbd:poolname/devicename[@snapshotname][:option1=value1[:option2=value2...]]
 *
 * poolname must be the name of an existing rados pool.
 *
 * devicename is the name of the rbd image.
 *
 * Each option given is used to configure rados, and may be any valid
 * Ceph option, "id", or "conf".
 *
 * The "id" option indicates what user we should authenticate as to
 * the Ceph cluster.  If it is excluded we will use the Ceph default
 * (normally 'admin').
 *
 * The "conf" option specifies a Ceph configuration file to read.  If
 * it is not specified, we will read from the default Ceph locations
 * (e.g., /etc/ceph/ceph.conf).  To avoid reading _any_ configuration
 * file, specify conf=/dev/null.
 *
 * Configuration values containing :, @, or = can be escaped with a
 * leading "\".
 */

So instead of specifying: /dev/..... of /path/to/somewhere allow different a different
backend to supply the data. In this case the images stored in a Ceph cluster. It is one
of the ways KVM is able to online migrate between systems.

And yes it would be major interesting if on top of this backend we could layer the virtual disk formats on top of that.

Has Marcel lost interest in continuing with this?

Marcel has mostly lost time :-)

Yup, that I recognise. So I usually try to chew off small chunks that are relatively less invasive in the code.
But you Are correct in asking what will be the end goal.
So after there is a way to include alternative backends, I'll try and build one based on ceph librados.

Currently I can mount a Ceph RBD image with rbd-ggate on /dev/ggate0 and then boot bhyve from that.
But that means a lot passes thru the userspace/kernel barrier, where access could be directly serviced from the bhyve provider to the ceph networking OSDs.

There are a few reasons why this didn't get committed a few months ago:

  1. libvdsk needs a good set of manpages that explains the interface in detail.
  2. To that end, some of the internals still had to be thought through. Things like multi-threading support is something you want to be clear on up front.
  3. The implementation needs to be able to help debug disk related issue in bhyve, because in bhyve the problems are more likely to show up as random crashes. Tracing support has been added, but I

m not sure it's done enough to say that we have a solid foundation on which to build and improve.

  1. Why qcow2 support is there in the early stages, I'm not comfortable exposing write support without a solid unit test suite. We can't really depend on bhyve for testing.
  2. I was hoping to create a common framework for supporting virtual disk formats, because they all have some kind of indexing to map from logical block address to physical offset. This really goes a long way in getting support for all of the virtual disk formats that mkimg supports and hopefully with comparable quality. I expect such an approach avoids a lot of duplication.

Note that I'm really talking about a Remote Block Device, which actually functions as remote block disks without much interpretation.

This obviously is setting a high bar and that's why it's not committed. The flip side is that it may never be committed because it's up to one or two people to get the implementation there before the communicate can help out. As such, it may not be the best approach.

I got more or less that same problem why I started with the Ceph port to FreeBSD. I had this massive patch that had to go in, and which was way to much for review. So I tried keeping up with the code and rebased until I was black and blue. After I while I turned it around and cut the PR into smaller but very comprehensible chunks which were very low impact. Those got submitted why much easier, and in the end I got the complete patchset commited. Took a bit longer, but it steadily progressed to the ultimate goal.

Getting the block_backend multiplexer in, will probably change the libvdisk code, in that it'll
one of the block citizens under the backend multiplexer. And make that part of the code
superfluous.

libvdsk is intended to be "the" backend. Having it be one backend means that there's code duplication between libvdsk and bhyve and that's probably something we should avoid.

Like I said before, I do not really want to do this, but think I have to, because just hacking it in the current code without abstraction is IMHO not the correct way forward.
And the net_backends.{ch} is a creal example how this was fixed in a similar case.

It made me wonder why you wanted to put this in a library, instead of directly in the bhyve code.
But I guess you are aiming to make this reusable by other applications as well?
I guess that what I would need from your code looks more or less like vdisk.{hc} and raw.c and perhaps a bit more glue.
And then a way to provide a RBD image as blockdevice.

This is a relative small step to take, and should not impact the current code too much.
After that all of the other nuts and bolts can be added step by step. And yes I would like to have tracing, debugging and all the other nicities.
But to run, you first need to learn to walk.
I you want we can take this offline, and see if we can work something minimal out that I can use for what I need, but does not impede you to get the other parts of libvdisk in at a later stage.

For testing I'm currently just building bhyve, boot an FBSD image, and run buildworld.
Sort of a very crude test, but it does trash the VM rather big time.

I think this patch is the right direction. Now it’s really very difficult to add a storage backend other than file-like operations.

I think this patch is the right direction. Now it’s really very difficult to add a storage backend other than file-like operations.

I more closely reviewed the code in the libvdsk library, and I think there is a challenge in layering architecture.
libvdsk still does not solve what I'm trying to do, because it still only assumes that the original basic source of the data comes from the LOCAL files system. Even if it could be a mounted file system.
Note that I'm able to run a bhyve instance on a /dev/ggate0 blockdevice. /dev/ggate? are RBD images that are mapped to a geom device and can be as was it a disk.
I was even impressed in how well it performed.

This is more or less how I see things develop in my head the last few days:

+-----------------------------------+
|          bhyve-guest              |
|                                   |
|         /dev/...                  |
| vtbd?   | ada?   | cd? |          |
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
            ^^^^^^^
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
|           VirtIO                  |
|                                   |
|         Bhyve host                |
|                                   |
+-----------------------------------+
|         Disk formats              |
| raw | qcow1/2 | vmdk | vdi| ..... |
+-----------------------------------+
|         block providers           |
| local | zfsvol | iscsi | RBD |... |
+-----------------------------------+

where the 2 bottom blocks "could be" in libvdsk, but that need not be the case.
And in both layers we need a multiplexer if we want to be able to plugin other providers.
But both layers could be/should be properly decoupled to keep enough abstration.
And I chose the API currently defined in block_if as abstraction for the block providers, which made it a cut&paste&search&replace type PR.
Which I think is a good think, since it is a low risk change to start with, and a such would hopefully get somewhat easier accepted 8-D

After I saw remarks about D10335: VirtFS/9p filesystem passthrough support (virtio-9p)​
I wonder where it exactly goes into in the IO-stack.
So is this below anywhere near the right place?

+-----------------------------------+
|          bhyve-guest              |
|                                   |
|         /dev/...                  |
|  vtbd   | ada   | cd  | nmve      |
+-----------------------------------+
 ^^^^       ^^^^^^
+-----------------------------------+
| VirtIO  | PCI                     |
+-----------------------------------+
|                    |  lib9p       |
|                    +--------------+
|         Bhyve host                |
|                                   |
+-----------------------------------+
|      Disk formats (libvdsk)       |
| raw | qcow1/2 | vmdk | vdi| ..... |
+-----------------------------------+
| block providers (block_backends)  |
| local | zfsvol | iscsi | RBD |... |
+-----------------------------------+

Addresses the style remarks

bcr accepted this revision as: manpages.Thu, Jan 23, 11:30 AM
bcr added a subscriber: bcr.

OK from manpages.

usr.sbin/bhyve/block_backends.h
85 ↗(On Diff #67191)

Can you avoid changing the name? blockif still seems more appropriate. I think this could be a much smaller diff.

usr.sbin/bhyve/pci_ahci.c
138 ↗(On Diff #67191)

You shouldn't need both be and bctx here. Most of the changes in this file will go away if you keep the existing blockif function and struct names.

usr.sbin/bhyve/pci_virtio_block.c
146 ↗(On Diff #67191)

Likewise here.

usr.sbin/bsdinstall/bsdinstall.8
28 ↗(On Diff #67191)

🤔

usr.sbin/bhyve/block_backends.h
85 ↗(On Diff #67191)

It is a good question, which triggers quite a few ideas in my head.
So I'm just going to answer the questions you asked, because those ideas need to settle a bit.
Challenge is that this is basically a function Multiplexer, that as an "bottom" API towards the 'physical' storage devices which are given on the bhyve CLI.
And there is the API towards the "top" where the remainder of bhyve-host connects on.
Now how much are they the same, and how much can/should be hidden.

I did this rename on purpose, since I wanted to emphasize that there is quite a difference between the old blockif and the newer backend multiplexer. Also note that at first the backend mux and the locblk provider were in the same files, which made it even more confusing to keep the old name.

But there is a difference between block_backend_t and blockif_ctxt_t.
The first one (block_backend_t) describes the block-provider and holds the jumptable to the provider functions.
The second structure (blockif_ctxt_t) describes the device that is used for this specific connection to a backing storage.
like:

struct blockif_ctxt {
	int			bc_magic;
	int			bc_fd;
	int			bc_ischr;
	int			bc_isgeom;
	int			bc_candelete;
	int			bc_rdonly;
	off_t			bc_size;
	int			bc_sectsz;
	int			bc_psectsz;
	int			bc_psectoff;
	int			bc_closing;
	pthread_t		bc_btid[BLOCKIF_NUMTHR];
	pthread_mutex_t		bc_mtx;
	pthread_cond_t		bc_cond;

	/* Request elements and free/pending/busy queues */
	TAILQ_HEAD(, blockif_elem) bc_freeq;       
	TAILQ_HEAD(, blockif_elem) bc_pendq;
	TAILQ_HEAD(, blockif_elem) bc_busyq;
	struct blockif_elem	bc_reqs[BLOCKIF_MAXREQ];
};

Which is the original blockif_ctxt_t. And I feel uncomfortable to just glue this in the structure that describes the backend.

usr.sbin/bhyve/pci_ahci.c
138 ↗(On Diff #67191)

They are not the same as I previously explained.
But perhaps struct blockif_ctxt *bctx should be an opague pointer, so each type of backing storage can put his/her data in a separate structure.

Problem of hanging everthing off a blockif_ctxt structure, is that that struct is very much tailored to the current properties of a physical disk. And other backing-stores might need other data.

usr.sbin/bhyve/pci_virtio_block.c
146 ↗(On Diff #67191)

Well, here struct blockif_ctxt *bc can go, because it is not referenced.
But note that throughout the code, the backend descriptor is the top component that is carried around and the blockif is the child of that backend instance.

usr.sbin/bsdinstall/bsdinstall.8
28 ↗(On Diff #67191)

hmmmmm,

Blame this to my unfamiliarity with arc.
I did arc diff --update D23010
which should have been arc diff --update D23010 usr.sbin/bhyve

Remove bsdinstall diffs that got in this set by accident.

Remove struct blockif_ctxt *bc from struct pci_vtblk_softc.
If one needs block device info, you can get it thru the backend-descriptior

fixed a few more compile errors.

Reversed the way the backend selector is integrated in blockif.
Now every blockif_ctxt is extended with a reference to a backend provider.

wjw_digiware.nl edited the summary of this revision. (Show Details)Sun, Jan 26, 11:54 PM

I'll be taking a closer look when I get a chance but at a glance at the diff this looks like a much cleaner change now. Thank you for taking the time to work on it!

bhyve/blocklocal: do external refence to init() and cleanup()

init() and cleanup() are calls that are only refered to from with
the actual backend. So block_if does not need to know them.

usr.sbin/bhyve/block_if.c
47–49

Indentation looks different than in the rest of the file.

117

!= NULL please to avoid confusion about the type.

136

This isn't the only place I see this, but I'll just point it out here:
Minor style nit - Values in return statements should be enclosed in parentheses.

return (bc->be->close(bc));
usr.sbin/bhyve/block_if.h
144

Please prefix the struct members with bb_.

185

I think it would be a bit simpler to use an opaque pointer for private data.

Ah, Willem, this week we have SUSE Hack Week, and I was not aware that you had already been working on rbd driver for bhyve, and picked this as my Hack Week project. And just today accidentally (after your comment to my qemu review) I noticed your review request. I already have some working code [1]. It also adds an abstraction layer in a way similar to yours and has a working driver. It still needs some work: right now librados and librbd from ports are just linked to bhyve, which is not acceptable for upstream, I think we need some solution to try dynamically load the libraries (or even rbd driver) at run time. Anyway I already tried it and it looks like work.

So it looks like unfortunately we did duplicate work. But I am glad you are working on it and would be happy if your version goes upstream, as you started to work on it early. You could look at my version for some ideas though. As it already has rbd driver it might be clearer for reviewers what is the goal of adding this abstraction.

[1] https://github.com/freebsd/freebsd/compare/master...trociny:bhyve-rbd

Ah, Willem, this week we have SUSE Hack Week, and I was not aware that you had already been working on rbd driver for bhyve, and picked this as my Hack Week project. And just today accidentally (after your comment to my qemu review) I noticed your review request. I already have some working code [1]. It also adds an abstraction layer in a way similar to yours and has a working driver. It still needs some work: right now librados and librbd from ports are just linked to bhyve, which is not acceptable for upstream, I think we need some solution to try dynamically load the libraries (or even rbd driver) at run time. Anyway I already tried it and it looks like work.
So it looks like unfortunately we did duplicate work. But I am glad you are working on it and would be happy if your version goes upstream, as you started to work on it early. You could look at my version for some ideas though. As it already has rbd driver it might be clearer for reviewers what is the goal of adding this abstraction.
[1] https://github.com/freebsd/freebsd/compare/master...trociny:bhyve-rbd

Great, I was indeed wondering where the QEMU fix al of a sudden caem from, and when you'd do something for bhyve.

Well. if nothing else, your work is also the "proof" that we at least should have something like a block-device multiplexer.
And I'm usually submitting work in small steps to give reviewers the oportunity to get a good feel of the code, without
too much distraction of putting 2 problems in 1 PR.

Last three weeks have been rather busy with $JOB, so have not yet adressed the comments here,
And getting things commited does not have the same pace as the Ceph PRs have.
But I'll have time this weekend to work on my Ceph stuff. And I'll look at yout Git as well.

And yes, in the end I thing it will require something of a DL-type of loading this "driver".
Perhaps create a directory like /usr/libexec/bhyve/blocks and load block_<driver>.so if there is
a reference to the driver on the PCI specification in the command-line.

trociny added inline comments.Wed, Feb 12, 3:50 PM
usr.sbin/bhyve/block_if.c
104

BTW, in my version I just check optstr for a prefix ending with semicolon (e.g. 'rbd:'), and then use this prefix to find the driver.

wjw_digiware.nl marked 5 inline comments as done.

usr.sbin/bhyve: Addressed comments from @ryan_freqlabs.com

usr.sbin/bhyve/block_if.c
104

@trociny

I think it is beter actually to leave this up to the backends themselfs, this creates more flexibility. And would for example allow for multiple /........ backends. Where if the basic local backend refuses, there could be other backends working on this optstr that can handle the request.

136

I have not fixed this in files I have not touched.

trociny added inline comments.Fri, Feb 14, 2:52 PM
usr.sbin/bhyve/block_if.c
104

I can hardly imagine how backends are going to detect if its "their" device, if only they would not rely on some prefix name. And if they would why not to do this in generic way instead of implementing this in every driver?

So, bhyve would look for provided disk name, and if it had "foo:" prefix, it would assume that the driver is "foo", and would try to open the device using this driver (if it had been successfully loaded.

usr.sbin/bhyve/block_if.c
104

I agree with scheme: suggest, ie file://, rbd://, nbd://, etc. with a fallback to the file backend if no scheme is given. Each backend can register its scheme name and we can directly pick the right one with no ambiguity.

usr.sbin/bhyve/block_if.c
104

Oops, I meant to remove // from the rest of those before I submit. Anything after the : would be passed to the backend.

usr.sbin/bhyve/block_if.c
104

Right, and perhaps keep the /...... for legacy defaulting to local store.

104

@trociny

I guess it would be possible to add a scheme descriptor to the backend struct, and only go with that.
But then still you'd have to call bb_open() to see it the optstr is a valid string and attach the device.

usr.sbin/bhyve: Add bb_sche to a backend to specify which backend to use

usr.sbin/bhyve: Added possible scheme to the man page

trociny added inline comments.Fri, Feb 14, 7:40 PM
usr.sbin/bhyve/block_if.c
102

What is a point of passing the scheme prefix to the driver? The first step that it would need to do is to trim it. Why not instead always pass to the bb_open optstr without scheme. It is easy to trim it from the optstr that has it -- just increase optstr by the prefix length.

114

Note, bb_open may return NULL here and you will be dereferencing NULL

usr.sbin/bhyve/block_if.h
146

nit: I would call it bb_name and store the backend name without ending ':' ("file", "rbd").

usr.sbin/bhyve/block_if.c
102

Yeah, that is the other way around .....
Makes it easier here, but a bit more code in the FOR loop

114

good call

usr.sbin/bhyve/block_if.h
146
  1. Sure, can do
  2. that makes the code only more convoluted.

what if optstr contains /file/, so it will require checking if strstr returns start of optstr AND that bb_name != ""
And then we need still to go out and find the ":"
But I'll give it a shot.

trociny added inline comments.Fri, Feb 14, 8:52 PM
usr.sbin/bhyve/block_if.c
102

Otherwise you would have "a bit mode code" in every driver to parse the prefix.

usr.sbin/bhyve/block_if.h
146

I am not quite sure I follow. I suggest not to store ':' in the backend name, but sure the ':' should be present in optstr to separate scheme (backend) name from the rests of params. When parsing optstr for the backend name you should look for the first occurrence of ':' character and use the part of the string before ':' as a backend name if all characters there are alphanumeric. And the part of the string after ':' is passed to the backend open function (if you found the backend with such a name).

usr.sbin/bhyve: Update the backend selection code

usr.sbin/bhyve: Make the legacy case operational

@trociny
BTW: I'm running my tests like

bhyve-rbd -H -P -A -c 4 -m 2G -l com1,/dev/nmdm1213A -s 0,hostbridge -s 1,lpc -s 2,virtio-net,tap1213 -s 3,ahci-hd,file:/dev/ggate0 -s 4,ahci-hd,/dev/ggate1 FBSD1213

So we are already running of Ceph disks, be it in an odd way.