Page MenuHomeFreeBSD

bhyve: abstraction for network backends
ClosedPublic

Authored by vmaffione on Jun 16 2019, 9:00 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 16, 8:50 AM
Unknown Object (File)
Tue, Jan 14, 7:16 AM
Unknown Object (File)
Mon, Jan 13, 2:03 AM
Unknown Object (File)
Sat, Jan 11, 9:04 PM
Unknown Object (File)
Dec 15 2024, 3:28 AM
Unknown Object (File)
Dec 14 2024, 5:04 PM
Unknown Object (File)
Dec 1 2024, 2:24 AM
Unknown Object (File)
Nov 22 2024, 2:09 PM

Details

Summary

Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000, and
connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and backend
functionalities. As a result, the backend code is duplicated between the two virtual
NIC implementations and also within the same virtual NIC. Also, e1000 cannot
currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 25014

Event Timeline

vmaffione created this object with visibility "No One".
vmaffione retitled this revision from bhyve: abstraction for network backends to bhyve: abstraction for network backends (WIP).Jun 16 2019, 9:01 AM
vmaffione removed a reviewer: bhyve.
vmaffione retitled this revision from bhyve: abstraction for network backends (WIP) to bhyve: abstraction for network backends.Jun 24 2019, 8:52 PM
vmaffione edited the summary of this revision. (Show Details)
vmaffione added reviewers: jhb, bryanv.
vmaffione changed the visibility from "No One" to "Public (No Login Required)".

I like the idea in general.

usr.sbin/bhyve/net_backends.c
122

Most of the comments are correct, but the style would be to put the leading '/*' and trailing '*/' on their own lines.

A few other comments in the patch also are not quite correct.

208

FreeBSD's style is to put ()'s around return values, so:

return (0);

here and elsewhere throughout the patch.

257

I think it would be simpler to just have two backend structs for tap vs vmnet and then you don't need all the parsing logic for handling '|', etc. Just have a single 'prefix' and use simple 'strncmp' matching without needing all the token, etc. Having an extra struct is pretty cheap in exchange for the code simplicity.

346

style(9) also does a 4-space continuation indent.

363

Maybe use strlcpy() instead?

404

Is this really better than using memcpy() directly? Note that memcpy() in libc on both Linux and FreeBSD will use optimized routines for the current CPU.

786

Nit: spaces around '+'

usr.sbin/bhyve/pci_e82545.c
1047

Does 'inline' actually make a difference? I recently found in another patch that clang ignored this and made up its own mind to inline or not.

usr.sbin/bhyve/pci_virtio_net.c
319

This seems unrelated (separate change) and is also odd style). Can you instead move this code to when the thread is being created as I think that is more common in the existing bhyve source?

476

Hmm, the name is already set here?

vmaffione marked 10 inline comments as done.

Tried to address jhb's comments.

Thanks for the in depth review. Sorry about the style issues. Is there a FreeBSD clang-format file against which I can validate code?

usr.sbin/bhyve/net_backends.c
122

Yes, indeed. Thanks for spotting them.

257

I knew this was going to come up... Yes, it makes sense.

363

Thanks, this slipped through during my rebase.

404

Agreed. This is something actually very old that I still had around.

usr.sbin/bhyve/pci_e82545.c
1047

For sure it doesn't for e1000, so in this case it's useless.

usr.sbin/bhyve/pci_virtio_net.c
319

Indeed, this is a leftover of a previous iteration. Thanks.

Just a few minor style nits, but I think this is ok once those are addressed.

usr.sbin/bhyve/net_backends.c
38

Sorry I missed this earlier, sys/types.h should always be first when it is used.

40

Ideally the other net/*.h headers would be here with net/if.h.

pci_virtio_net.c also had the capsicum headers sorted into the regular blocks.

226

return() here as well.

272

s/ot/of/

363

return ()

372

return ()

533

I suspect this is longer than 80 cols?

579

s/ot/of/

618

I prefer to use 'strncmp(...) == 0' rather than !strncmp, and style(9) requests to only use ! for booleans.

667

return ()

728

return ()

798

return ()

This revision is now accepted and ready to land.Jun 27 2019, 11:01 PM
usr.sbin/bhyve/net_backends.c
786

What do you think about adding this function to the backend API. For netmap, there is no need to copy data into a dummy buffer, we can just move ring pointers.

usr.sbin/bhyve/pci_virtio_net.c
227–228

I think len < 0 must be handled:

if (len < 0) {
   	vq_endchains(vq, 0);
        return;
}

Otherwise, we call vq_relchain(vq, idx, (uint32_t)len); with incorrect parameter and increment used_idx.

432

Netmap adds TSO flags to the VTNET_S_HOSTCAPS flags, but the latter contain the VIRTIO_NET_F_MRG_RXBUF flag which is not yet supported.

I was able to run this patch only by removing VIRTIO_NET_F_MRG_RXBUF from VTNET_S_HOSTCAPS.

vmaffione marked 15 inline comments as done.

Addressed coments from reviewers.

This revision now requires review to proceed.Jun 30 2019, 7:59 PM

Thanks for the review. I actually don't have yet a real testbed for this (will get one soon), and I can only do functional tests in a nested KVM VM. Any test is welcome.

usr.sbin/bhyve/net_backends.c
40

Done. I hope I did not miss anything.

226

Ok, I thought the rule was not valid for return statements containing a function call.

786

Good point, but I think it's not convenient in practice, and for two reasons. First, once I implement receive backpressure, discard won't happen anymore (at least that's the plan). Second, making this operation faster would be counter productive, since we are giving less time to the guest to replenish. The result is more packet drops.

usr.sbin/bhyve/pci_virtio_net.c
227–228

You are right. I changed the code so to handle len<=0

432

Indeed, my original patch had VIRTIO_NET_F_MRG_RXBUF removed. Thanks for spotting.
With the mergeable rx buffers feature on, the guest will indeed fail receiving large packets with small buffers.

I tested the latest version of the patch on real hardware with different operating systems and different MTUs, using iperf3 and ping -f -c. It seems everything works as expected.
VM1 (Ubuntu 16.04) - VALE - VM2 (Ubuntu 16.04) iperf3 speed 16 Gbits/s
VM1 (FreeBSD 12) - VALE - VM2 (FreeBSD 12) iperf3 speed 11 Gbits/s
VM1 (Ubuntu 16.04) - VALE - VM2 (FreeBSD 12) iperf3 speed 14 Gbits/s

Thanks. This touches also TAP and setups, so we need tests with TAP.
Also, we would need to check that TAP+e1000 works.

This revision is now accepted and ready to land.Jul 5 2019, 4:57 PM
This revision was automatically updated to reflect the committed changes.