Page MenuHomeFreeBSD

usr.sbin/bhyve: Implement a generic block store backend interf.
Needs ReviewPublic

Authored by wjw_digiware.nl on Jan 2 2020, 5:11 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 26, 10:39 PM
Unknown Object (File)
Tue, Nov 26, 10:52 AM
Unknown Object (File)
Tue, Nov 26, 5:27 AM
Unknown Object (File)
Sat, Nov 23, 4:09 AM
Unknown Object (File)
Sat, Nov 16, 2:00 PM
Unknown Object (File)
Fri, Nov 8, 6:04 AM
Unknown Object (File)
Oct 29 2024, 8:55 AM
Unknown Object (File)
Oct 24 2024, 6:41 AM

Details

Reviewers
mav
freqlabs
Group Reviewers
bhyve
manpages
Summary

To be able to plugin different storage systems an abstraction
layer for the block storage is "required".

This is currently implemented in the existing block_if.{hc}
files.
The current callee access routings have their prefixes renamed
from blockif_ to blocklocal_.
And have moved into a separate pair of files block_local.{hc}
Functional there should not be an change, other that the
calls to the storage API now go thru a an extra set of tabled
functions.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 29378
Build 27271: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Remove bsdinstall diffs that got in this set by accident.

Remove struct blockif_ctxt *bc from struct pci_vtblk_softc.
If one needs block device info, you can get it thru the backend-descriptior

fixed a few more compile errors.

Reversed the way the backend selector is integrated in blockif.
Now every blockif_ctxt is extended with a reference to a backend provider.

I'll be taking a closer look when I get a chance but at a glance at the diff this looks like a much cleaner change now. Thank you for taking the time to work on it!

bhyve/blocklocal: do external refence to init() and cleanup()

init() and cleanup() are calls that are only refered to from with
the actual backend. So block_if does not need to know them.

usr.sbin/bhyve/block_if.c
47–48

Indentation looks different than in the rest of the file.

111

!= NULL please to avoid confusion about the type.

130

This isn't the only place I see this, but I'll just point it out here:
Minor style nit - Values in return statements should be enclosed in parentheses.

return (bc->be->close(bc));
usr.sbin/bhyve/block_if.h
144

Please prefix the struct members with bb_.

185

I think it would be a bit simpler to use an opaque pointer for private data.

Ah, Willem, this week we have SUSE Hack Week, and I was not aware that you had already been working on rbd driver for bhyve, and picked this as my Hack Week project. And just today accidentally (after your comment to my qemu review) I noticed your review request. I already have some working code [1]. It also adds an abstraction layer in a way similar to yours and has a working driver. It still needs some work: right now librados and librbd from ports are just linked to bhyve, which is not acceptable for upstream, I think we need some solution to try dynamically load the libraries (or even rbd driver) at run time. Anyway I already tried it and it looks like work.

So it looks like unfortunately we did duplicate work. But I am glad you are working on it and would be happy if your version goes upstream, as you started to work on it early. You could look at my version for some ideas though. As it already has rbd driver it might be clearer for reviewers what is the goal of adding this abstraction.

[1] https://github.com/freebsd/freebsd/compare/master...trociny:bhyve-rbd

Ah, Willem, this week we have SUSE Hack Week, and I was not aware that you had already been working on rbd driver for bhyve, and picked this as my Hack Week project. And just today accidentally (after your comment to my qemu review) I noticed your review request. I already have some working code [1]. It also adds an abstraction layer in a way similar to yours and has a working driver. It still needs some work: right now librados and librbd from ports are just linked to bhyve, which is not acceptable for upstream, I think we need some solution to try dynamically load the libraries (or even rbd driver) at run time. Anyway I already tried it and it looks like work.

So it looks like unfortunately we did duplicate work. But I am glad you are working on it and would be happy if your version goes upstream, as you started to work on it early. You could look at my version for some ideas though. As it already has rbd driver it might be clearer for reviewers what is the goal of adding this abstraction.

[1] https://github.com/freebsd/freebsd/compare/master...trociny:bhyve-rbd

Great, I was indeed wondering where the QEMU fix al of a sudden caem from, and when you'd do something for bhyve.

Well. if nothing else, your work is also the "proof" that we at least should have something like a block-device multiplexer.
And I'm usually submitting work in small steps to give reviewers the oportunity to get a good feel of the code, without
too much distraction of putting 2 problems in 1 PR.

Last three weeks have been rather busy with $JOB, so have not yet adressed the comments here,
And getting things commited does not have the same pace as the Ceph PRs have.
But I'll have time this weekend to work on my Ceph stuff. And I'll look at yout Git as well.

And yes, in the end I thing it will require something of a DL-type of loading this "driver".
Perhaps create a directory like /usr/libexec/bhyve/blocks and load block_<driver>.so if there is
a reference to the driver on the PCI specification in the command-line.

usr.sbin/bhyve/block_if.c
98

BTW, in my version I just check optstr for a prefix ending with semicolon (e.g. 'rbd:'), and then use this prefix to find the driver.

wjw_digiware.nl marked 5 inline comments as done.

usr.sbin/bhyve: Addressed comments from @ryan_freqlabs.com

usr.sbin/bhyve/block_if.c
98

@trociny

I think it is beter actually to leave this up to the backends themselfs, this creates more flexibility. And would for example allow for multiple /........ backends. Where if the basic local backend refuses, there could be other backends working on this optstr that can handle the request.

130

I have not fixed this in files I have not touched.

usr.sbin/bhyve/block_if.c
98

I can hardly imagine how backends are going to detect if its "their" device, if only they would not rely on some prefix name. And if they would why not to do this in generic way instead of implementing this in every driver?

So, bhyve would look for provided disk name, and if it had "foo:" prefix, it would assume that the driver is "foo", and would try to open the device using this driver (if it had been successfully loaded.

usr.sbin/bhyve/block_if.c
98

I agree with scheme: suggest, ie file://, rbd://, nbd://, etc. with a fallback to the file backend if no scheme is given. Each backend can register its scheme name and we can directly pick the right one with no ambiguity.

usr.sbin/bhyve/block_if.c
98

Oops, I meant to remove // from the rest of those before I submit. Anything after the : would be passed to the backend.

usr.sbin/bhyve/block_if.c
98

Right, and perhaps keep the /...... for legacy defaulting to local store.

98

@trociny

I guess it would be possible to add a scheme descriptor to the backend struct, and only go with that.
But then still you'd have to call bb_open() to see it the optstr is a valid string and attach the device.

usr.sbin/bhyve: Add bb_sche to a backend to specify which backend to use

usr.sbin/bhyve: Added possible scheme to the man page

usr.sbin/bhyve/block_if.c
79

Note, bb_open may return NULL here and you will be dereferencing NULL

87

What is a point of passing the scheme prefix to the driver? The first step that it would need to do is to trim it. Why not instead always pass to the bb_open optstr without scheme. It is easy to trim it from the optstr that has it -- just increase optstr by the prefix length.

usr.sbin/bhyve/block_if.h
146

nit: I would call it bb_name and store the backend name without ending ':' ("file", "rbd").

usr.sbin/bhyve/block_if.c
79

good call

87

Yeah, that is the other way around .....
Makes it easier here, but a bit more code in the FOR loop

usr.sbin/bhyve/block_if.h
146
  1. Sure, can do
  2. that makes the code only more convoluted.

what if optstr contains /file/, so it will require checking if strstr returns start of optstr AND that bb_name != ""
And then we need still to go out and find the ":"
But I'll give it a shot.

usr.sbin/bhyve/block_if.c
87

Otherwise you would have "a bit mode code" in every driver to parse the prefix.

usr.sbin/bhyve/block_if.h
146

I am not quite sure I follow. I suggest not to store ':' in the backend name, but sure the ':' should be present in optstr to separate scheme (backend) name from the rests of params. When parsing optstr for the backend name you should look for the first occurrence of ':' character and use the part of the string before ':' as a backend name if all characters there are alphanumeric. And the part of the string after ':' is passed to the backend open function (if you found the backend with such a name).

usr.sbin/bhyve: Update the backend selection code

usr.sbin/bhyve: Make the legacy case operational

@trociny
BTW: I'm running my tests like

bhyve-rbd -H -P -A -c 4 -m 2G -l com1,/dev/nmdm1213A -s 0,hostbridge -s 1,lpc -s 2,virtio-net,tap1213 -s 3,ahci-hd,file:/dev/ggate0 -s 4,ahci-hd,/dev/ggate1 FBSD1213

So we are already running of Ceph disks, be it in an odd way.

usr.sbin/bhyve/block_if.c
54

If you null out the ':' before moving the pointer forward then you can use strcmp below to ensure we get an exact match for the backend name.

*optrest = '\0';
optrest++;
60–61

This comment is outdated.

81–82

We can break if open fails. There should be only one backend with the given name.

usr.bin/bhyve: implement code improvements

I guess I addresses a suggestions thusfar.

This is getting really close to the finish line.

usr.sbin/bhyve/block_if.h
185

void * is more appropriate for private data.

usr.sbin/bhyve/block_local.c
813

These names seem backwards to me. What do you think about this instead?

.bb_name      = "blk-local",
.bb_scheme    = "file",

Then name would be the thread name and scheme would be the scheme name for matching the backend.

This revision now requires changes to proceed.Feb 25 2020, 3:25 PM

usr.sbin/bhyve: change opague pointer, and rewrote to bb_threadinfo

wjw_digiware.nl added inline comments.
usr.sbin/bhyve/block_local.c
813

This you will have to fight with @trociny, since he did not like the name scheme.
Note that ATM, bb_threadinfo is not used, since we only use what the parent
inserted. So it is things like `"%d:%d", pi->pi_slot, pi->pi_func that set the threadname.

usr.sbin/bhyve/block_local.c
813

My point actually was why have both bb_name|prefix and bb_scheme? Why don't just have one param: bb_name, so it is both the thread name and scheme name, whatever. It looks for me perfectly well if one have to specify a device like 'blk-local:/path/to/dev_or_file'

freqlabs added inline comments.
usr.sbin/bhyve/block_local.c
813

I had not noticed that bb_threadinfo isn't used by the blockif abstraction. That being the case, just get rid of it. If a backend needs to store some thread info for its own threads, it can keep it in the private data.

With that out of the way I don't mind much whether the remaining member is called bb_name or bb_scheme.

This revision is now accepted and ready to land.Feb 26 2020, 9:08 AM

usr.sbin/bhyve: remove threadinfo string from block device description

This revision now requires review to proceed.Feb 26 2020, 11:46 AM

usr.sbin/bhyve: Make the access decriptor more flexible

For access of the disks/files we use a filedescriptor.
But there are other storage services that need something different then just
an integer.

Changed bc_fd into a union that holds the FD, but also can be used a opague
pointer.

usr.sbin/bhyve: addressed comments by trociny and asommers

Revered typedef
Changed legacy "file" handling
Implemented dynamic loading of external blockdrivers

usr.sbin/bhyve: Fix setting for dynamic loading

usr.sbin/bhyve: Fixed DATA_SSET() usage