usr.sbin/bhyve/net_backends.c
260	Correct. I will remove this once the kernel-clean-up-on-close goes in. In any case, this patch has two dependencies that must go in first: https://reviews.freebsd.org/D20987 https://reviews.freebsd.org/D21007

jhb added inline comments.Nov 7 2019, 5:17 PM

usr.sbin/bhyve/net_backends.c
195	Tiny nit: It would be good to be consistent with the style of block comments, i.e. /* * FOO */
260	Has the kernel cleanup gone in?
283	Maybe just check the result of strlcpy() to detect overflow?

kevans added inline comments.Nov 7 2019, 5:19 PM

usr.sbin/bhyve/net_backends.c
260	tuntap(4) resetting vnethdr on close went in with the tuntap vnethdr support, at least.

Addressed reviewer comments.

vmaffione added inline comments.Nov 7 2019, 9:56 PM

usr.sbin/bhyve/net_backends.c
260	Yes, vnethdr is now reset to 0 from `tundtor()`, which happens when the bhyve process terminates.

jhb accepted this revision.Nov 8 2019, 5:26 PM

This revision is now accepted and ready to land.Nov 8 2019, 5:26 PM

Unfortunately there is a blocker for this change.

From what I can see (and I may be wrong), the mbuf does not have enough space to hold the metadata in struct virtio_net_hdr, at least in the packet fowarding case (e.g. a packet received from tap0 to be forwarded to tap1 or to em0). The metadata contains information about how to compute the checksum and perform the TCP segmentation. What we want to do is to delay the checksum computation and TCP segmentation until the mbuf physically exits the machine (where it can be performed in hardware by the physical NIC), or never perform those operations if the mbuf never leaves the physical machine. This is the whole point of virtio-net offloads, and the key to high speed TCP.

Consider the checksum metdatata: mbuf csum_data on tap0 receive is set to 0xFFFF (and CSUM_DATA_VALID is set in csum_flags), but the checksum has not really been computed yet. If the mbuf is delivered locally, everything works fine with the local TCP input processing. However, if the mbuf is forwarded to another member of bridge0 (e.g. tap1), we need to propagate the metadata, and so we should set CSUM_TCP in csum_flags and set csum_data to the checksum offset in the L4 header (e.g. 16 for TCP). But doing so would break the local delivery case. As a matter of facts, this patch improves the TCP throughput between VM vtnet0 and host bridge0 from 3Gbps to 10Gbps, but it breaks TCP/UDP networking between VM vtnet0 and the external network (and other VMs on the same host), because metadata are not correctly propagated and eventually the mbuf is dropped because of invalid checksum. The root problem is that csum_data has different meanings on local transmit and receive, and therefore cannot be reused to propagate metadata.

Note that the problem also affects if_vtnet(4) interfaces in the guest, if you try to bridge two of them through bridge0. TCP forwarding won't work (although local receive and local transmit does).

With enough space in the mbuf to store the metadata, the problem would go away.
For example, in VALE the virtio-net header is prepended to the payload of each packet.

Any opinions on this issue?

vmaffione added a subscriber: marius.Nov 9 2019, 5:56 PM

zlei added a subscriber: zlei.Sep 22 2024, 12:56 AM