Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000, and
connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and backend
functionalities. As a result, the backend code is duplicated between the two virtual
NIC implementations and also within the same virtual NIC. Also, e1000 cannot
currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Details
- Reviewers
jhb bryanv - Group Reviewers
bhyve - Commits
- rS350193: MFC r349803
rS349803: bhyve: abstraction for network backends
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
I like the idea in general.
usr.sbin/bhyve/net_backends.c | ||
---|---|---|
123 | Most of the comments are correct, but the style would be to put the leading '/*' and trailing '*/' on their own lines. A few other comments in the patch also are not quite correct. | |
209 | FreeBSD's style is to put ()'s around return values, so: return (0); here and elsewhere throughout the patch. | |
258 | I think it would be simpler to just have two backend structs for tap vs vmnet and then you don't need all the parsing logic for handling '|', etc. Just have a single 'prefix' and use simple 'strncmp' matching without needing all the token, etc. Having an extra struct is pretty cheap in exchange for the code simplicity. | |
347 | style(9) also does a 4-space continuation indent. | |
364 | Maybe use strlcpy() instead? | |
405 | Is this really better than using memcpy() directly? Note that memcpy() in libc on both Linux and FreeBSD will use optimized routines for the current CPU. | |
787 | Nit: spaces around '+' | |
usr.sbin/bhyve/pci_e82545.c | ||
1047 | Does 'inline' actually make a difference? I recently found in another patch that clang ignored this and made up its own mind to inline or not. | |
usr.sbin/bhyve/pci_virtio_net.c | ||
319 | This seems unrelated (separate change) and is also odd style). Can you instead move this code to when the thread is being created as I think that is more common in the existing bhyve source? | |
468 | Hmm, the name is already set here? |
Thanks for the in depth review. Sorry about the style issues. Is there a FreeBSD clang-format file against which I can validate code?
usr.sbin/bhyve/net_backends.c | ||
---|---|---|
123 | Yes, indeed. Thanks for spotting them. | |
258 | I knew this was going to come up... Yes, it makes sense. | |
364 | Thanks, this slipped through during my rebase. | |
405 | Agreed. This is something actually very old that I still had around. | |
usr.sbin/bhyve/pci_e82545.c | ||
1047 | For sure it doesn't for e1000, so in this case it's useless. | |
usr.sbin/bhyve/pci_virtio_net.c | ||
319 | Indeed, this is a leftover of a previous iteration. Thanks. |
Just a few minor style nits, but I think this is ok once those are addressed.
usr.sbin/bhyve/net_backends.c | ||
---|---|---|
38 | Sorry I missed this earlier, sys/types.h should always be first when it is used. | |
40 | Ideally the other net/*.h headers would be here with net/if.h. pci_virtio_net.c also had the capsicum headers sorted into the regular blocks. | |
226 | return() here as well. | |
272 | s/ot/of/ | |
363 | return () | |
372 | return () | |
533 | I suspect this is longer than 80 cols? | |
579 | s/ot/of/ | |
618 | I prefer to use 'strncmp(...) == 0' rather than !strncmp, and style(9) requests to only use ! for booleans. | |
667 | return () | |
728 | return () | |
798 | return () |
usr.sbin/bhyve/net_backends.c | ||
---|---|---|
786 | What do you think about adding this function to the backend API. For netmap, there is no need to copy data into a dummy buffer, we can just move ring pointers. | |
usr.sbin/bhyve/pci_virtio_net.c | ||
225–226 | I think len < 0 must be handled: if (len < 0) { vq_endchains(vq, 0); return; } Otherwise, we call vq_relchain(vq, idx, (uint32_t)len); with incorrect parameter and increment used_idx. | |
424 | Netmap adds TSO flags to the VTNET_S_HOSTCAPS flags, but the latter contain the VIRTIO_NET_F_MRG_RXBUF flag which is not yet supported. I was able to run this patch only by removing VIRTIO_NET_F_MRG_RXBUF from VTNET_S_HOSTCAPS. |
Thanks for the review. I actually don't have yet a real testbed for this (will get one soon), and I can only do functional tests in a nested KVM VM. Any test is welcome.
usr.sbin/bhyve/net_backends.c | ||
---|---|---|
40 | Done. I hope I did not miss anything. | |
226 | Ok, I thought the rule was not valid for return statements containing a function call. | |
786 | Good point, but I think it's not convenient in practice, and for two reasons. First, once I implement receive backpressure, discard won't happen anymore (at least that's the plan). Second, making this operation faster would be counter productive, since we are giving less time to the guest to replenish. The result is more packet drops. | |
usr.sbin/bhyve/pci_virtio_net.c | ||
225–226 | You are right. I changed the code so to handle len<=0 | |
424 | Indeed, my original patch had VIRTIO_NET_F_MRG_RXBUF removed. Thanks for spotting. |
I tested the latest version of the patch on real hardware with different operating systems and different MTUs, using iperf3 and ping -f -c. It seems everything works as expected.
VM1 (Ubuntu 16.04) - VALE - VM2 (Ubuntu 16.04) iperf3 speed 16 Gbits/s
VM1 (FreeBSD 12) - VALE - VM2 (FreeBSD 12) iperf3 speed 11 Gbits/s
VM1 (Ubuntu 16.04) - VALE - VM2 (FreeBSD 12) iperf3 speed 14 Gbits/s
Thanks. This touches also TAP and setups, so we need tests with TAP.
Also, we would need to check that TAP+e1000 works.