Page MenuHomeFreeBSD

bhyve: abstraction for network backends
ClosedPublic

Authored by vmaffione on Jun 16 2019, 9:00 AM.

Details

Summary

Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000, and
connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and backend
functionalities. As a result, the backend code is duplicated between the two virtual
NIC implementations and also within the same virtual NIC. Also, e1000 cannot
currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vmaffione created this revision.Jun 16 2019, 9:00 AM
vmaffione created this object with visibility "No One".
vmaffione retitled this revision from bhyve: abstraction for network backends to bhyve: abstraction for network backends (WIP).Jun 16 2019, 9:01 AM
vmaffione removed a reviewer: bhyve.
vmaffione updated this revision to Diff 58690.Jun 16 2019, 10:59 AM
vmaffione updated this revision to Diff 58898.Jun 22 2019, 3:56 PM
vmaffione updated this revision to Diff 58965.Jun 24 2019, 8:26 PM
vmaffione retitled this revision from bhyve: abstraction for network backends (WIP) to bhyve: abstraction for network backends.Jun 24 2019, 8:52 PM
vmaffione edited the summary of this revision. (Show Details)
vmaffione added reviewers: jhb, bryanv.
vmaffione changed the visibility from "No One" to "Public (No Login Required)".
jhb added a comment.Jun 27 2019, 7:53 PM

I like the idea in general.

usr.sbin/bhyve/net_backends.c
122 ↗(On Diff #58965)

Most of the comments are correct, but the style would be to put the leading '/*' and trailing '*/' on their own lines.

A few other comments in the patch also are not quite correct.

208 ↗(On Diff #58965)

FreeBSD's style is to put ()'s around return values, so:

return (0);

here and elsewhere throughout the patch.

257 ↗(On Diff #58965)

I think it would be simpler to just have two backend structs for tap vs vmnet and then you don't need all the parsing logic for handling '|', etc. Just have a single 'prefix' and use simple 'strncmp' matching without needing all the token, etc. Having an extra struct is pretty cheap in exchange for the code simplicity.

346 ↗(On Diff #58965)

style(9) also does a 4-space continuation indent.

363 ↗(On Diff #58965)

Maybe use strlcpy() instead?

404 ↗(On Diff #58965)

Is this really better than using memcpy() directly? Note that memcpy() in libc on both Linux and FreeBSD will use optimized routines for the current CPU.

786 ↗(On Diff #58965)

Nit: spaces around '+'

usr.sbin/bhyve/pci_e82545.c
1047 ↗(On Diff #58965)

Does 'inline' actually make a difference? I recently found in another patch that clang ignored this and made up its own mind to inline or not.

usr.sbin/bhyve/pci_virtio_net.c
319 ↗(On Diff #58965)

This seems unrelated (separate change) and is also odd style). Can you instead move this code to when the thread is being created as I think that is more common in the existing bhyve source?

476 ↗(On Diff #58965)

Hmm, the name is already set here?

vmaffione updated this revision to Diff 59117.Jun 27 2019, 9:47 PM
vmaffione marked 10 inline comments as done.

Tried to address jhb's comments.

Thanks for the in depth review. Sorry about the style issues. Is there a FreeBSD clang-format file against which I can validate code?

usr.sbin/bhyve/net_backends.c
122 ↗(On Diff #58965)

Yes, indeed. Thanks for spotting them.

257 ↗(On Diff #58965)

I knew this was going to come up... Yes, it makes sense.

363 ↗(On Diff #58965)

Thanks, this slipped through during my rebase.

404 ↗(On Diff #58965)

Agreed. This is something actually very old that I still had around.

usr.sbin/bhyve/pci_e82545.c
1047 ↗(On Diff #58965)

For sure it doesn't for e1000, so in this case it's useless.

usr.sbin/bhyve/pci_virtio_net.c
319 ↗(On Diff #58965)

Indeed, this is a leftover of a previous iteration. Thanks.

jhb accepted this revision.Jun 27 2019, 11:01 PM

Just a few minor style nits, but I think this is ok once those are addressed.

usr.sbin/bhyve/net_backends.c
37 ↗(On Diff #59117)

Sorry I missed this earlier, sys/types.h should always be first when it is used.

39 ↗(On Diff #59117)

Ideally the other net/*.h headers would be here with net/if.h.

pci_virtio_net.c also had the capsicum headers sorted into the regular blocks.

225 ↗(On Diff #59117)

return() here as well.

271 ↗(On Diff #59117)

s/ot/of/

362 ↗(On Diff #59117)

return ()

371 ↗(On Diff #59117)

return ()

532 ↗(On Diff #59117)

I suspect this is longer than 80 cols?

578 ↗(On Diff #59117)

s/ot/of/

617 ↗(On Diff #59117)

I prefer to use 'strncmp(...) == 0' rather than !strncmp, and style(9) requests to only use ! for booleans.

666 ↗(On Diff #59117)

return ()

727 ↗(On Diff #59117)

return ()

797 ↗(On Diff #59117)

return ()

This revision is now accepted and ready to land.Jun 27 2019, 11:01 PM
usr.sbin/bhyve/net_backends.c
785 ↗(On Diff #59117)

What do you think about adding this function to the backend API. For netmap, there is no need to copy data into a dummy buffer, we can just move ring pointers.

usr.sbin/bhyve/pci_virtio_net.c
227 ↗(On Diff #59117)

I think len < 0 must be handled:

if (len < 0) {
   	vq_endchains(vq, 0);
        return;
}

Otherwise, we call vq_relchain(vq, idx, (uint32_t)len); with incorrect parameter and increment used_idx.

424 ↗(On Diff #59117)

Netmap adds TSO flags to the VTNET_S_HOSTCAPS flags, but the latter contain the VIRTIO_NET_F_MRG_RXBUF flag which is not yet supported.

I was able to run this patch only by removing VIRTIO_NET_F_MRG_RXBUF from VTNET_S_HOSTCAPS.

vmaffione updated this revision to Diff 59224.Jun 30 2019, 7:59 PM
vmaffione marked 15 inline comments as done.

Addressed coments from reviewers.

This revision now requires review to proceed.Jun 30 2019, 7:59 PM

Thanks for the review. I actually don't have yet a real testbed for this (will get one soon), and I can only do functional tests in a nested KVM VM. Any test is welcome.

usr.sbin/bhyve/net_backends.c
39 ↗(On Diff #59117)

Done. I hope I did not miss anything.

225 ↗(On Diff #59117)

Ok, I thought the rule was not valid for return statements containing a function call.

785 ↗(On Diff #59117)

Good point, but I think it's not convenient in practice, and for two reasons. First, once I implement receive backpressure, discard won't happen anymore (at least that's the plan). Second, making this operation faster would be counter productive, since we are giving less time to the guest to replenish. The result is more packet drops.

usr.sbin/bhyve/pci_virtio_net.c
227 ↗(On Diff #59117)

You are right. I changed the code so to handle len<=0

424 ↗(On Diff #59117)

Indeed, my original patch had VIRTIO_NET_F_MRG_RXBUF removed. Thanks for spotting.
With the mergeable rx buffers feature on, the guest will indeed fail receiving large packets with small buffers.

I tested the latest version of the patch on real hardware with different operating systems and different MTUs, using iperf3 and ping -f -c. It seems everything works as expected.
VM1 (Ubuntu 16.04) - VALE - VM2 (Ubuntu 16.04) iperf3 speed 16 Gbits/s
VM1 (FreeBSD 12) - VALE - VM2 (FreeBSD 12) iperf3 speed 11 Gbits/s
VM1 (Ubuntu 16.04) - VALE - VM2 (FreeBSD 12) iperf3 speed 14 Gbits/s

Thanks. This touches also TAP and setups, so we need tests with TAP.
Also, we would need to check that TAP+e1000 works.

bryanv accepted this revision as: bryanv.Jul 5 2019, 4:57 PM
This revision is now accepted and ready to land.Jul 5 2019, 4:57 PM
This revision was automatically updated to reflect the committed changes.