Page MenuHomeFreeBSD

bhyve: tap: add support for offloads
Changes PlannedPublic

Authored by vmaffione on Aug 18 2019, 10:48 AM.

Details

Reviewers
jhb
markj
kevans
tychon
Group Reviewers
bhyve
Summary

Enhance the tap(4) net backend to allow bhyve VMs to offload checksum and TSO computation.

This patch depends on https://reviews.freebsd.org/D21263 .

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

vmaffione created this revision.Aug 18 2019, 10:48 AM
kevans added inline comments.Oct 19 2019, 2:49 PM
usr.sbin/bhyve/net_backends.c
260

This should be able to go away since we reset vnethdr on tunclose in tuntap(4).

vmaffione added inline comments.Oct 20 2019, 10:09 AM
usr.sbin/bhyve/net_backends.c
260

Correct. I will remove this once the kernel-clean-up-on-close goes in.
In any case, this patch has two dependencies that must go in first:
https://reviews.freebsd.org/D20987
https://reviews.freebsd.org/D21007

jhb added inline comments.Nov 7 2019, 5:17 PM
usr.sbin/bhyve/net_backends.c
195

Tiny nit: It would be good to be consistent with the style of block comments, i.e.

/*
 * FOO
 */
260

Has the kernel cleanup gone in?

283

Maybe just check the result of strlcpy() to detect overflow?

kevans added inline comments.Nov 7 2019, 5:19 PM
usr.sbin/bhyve/net_backends.c
260

tuntap(4) resetting vnethdr on close went in with the tuntap vnethdr support, at least.

vmaffione updated this revision to Diff 64064.Nov 7 2019, 9:50 PM
vmaffione marked 5 inline comments as done.

Addressed reviewer comments.

vmaffione added inline comments.Nov 7 2019, 9:56 PM
usr.sbin/bhyve/net_backends.c
260

Yes, vnethdr is now reset to 0 from tundtor(), which happens when the bhyve process terminates.

jhb accepted this revision.Nov 8 2019, 5:26 PM
This revision is now accepted and ready to land.Nov 8 2019, 5:26 PM
vmaffione planned changes to this revision.EditedNov 9 2019, 5:56 PM

Unfortunately there is a blocker for this change.

From what I can see (and I may be wrong), the mbuf does not have enough space to hold the metadata in struct virtio_net_hdr, at least in the packet fowarding case (e.g. a packet received from tap0 to be forwarded to tap1 or to em0). The metadata contains information about how to compute the checksum and perform the TCP segmentation. What we want to do is to delay the checksum computation and TCP segmentation until the mbuf physically exits the machine (where it can be performed in hardware by the physical NIC), or never perform those operations if the mbuf never leaves the physical machine. This is the whole point of virtio-net offloads, and the key to high speed TCP.

Consider the checksum metdatata: mbuf csum_data on tap0 receive is set to 0xFFFF (and CSUM_DATA_VALID is set in csum_flags), but the checksum has not really been computed yet. If the mbuf is delivered locally, everything works fine with the local TCP input processing. However, if the mbuf is forwarded to another member of bridge0 (e.g. tap1), we need to propagate the metadata, and so we should set CSUM_TCP in csum_flags and set csum_data to the checksum offset in the L4 header (e.g. 16 for TCP). But doing so would break the local delivery case. As a matter of facts, this patch improves the TCP throughput between VM vtnet0 and host bridge0 from 3Gbps to 10Gbps, but it breaks TCP/UDP networking between VM vtnet0 and the external network (and other VMs on the same host), because metadata are not correctly propagated and eventually the mbuf is dropped because of invalid checksum. The root problem is that csum_data has different meanings on local transmit and receive, and therefore cannot be reused to propagate metadata.

Note that the problem also affects if_vtnet(4) interfaces in the guest, if you try to bridge two of them through bridge0. TCP forwarding won't work (although local receive and local transmit does).

With enough space in the mbuf to store the metadata, the problem would go away.
For example, in VALE the virtio-net header is prepended to the payload of each packet.

Any opinions on this issue?