Enhance the tap(4) net backend to allow bhyve VMs to offload checksum and TSO computation.
This patch depends on https://reviews.freebsd.org/D21263 .
|Unit Tests Skipped|
Unfortunately there is a blocker for this change.
From what I can see (and I may be wrong), the mbuf does not have enough space to hold the metadata in struct virtio_net_hdr, at least in the packet fowarding case (e.g. a packet received from tap0 to be forwarded to tap1 or to em0). The metadata contains information about how to compute the checksum and perform the TCP segmentation. What we want to do is to delay the checksum computation and TCP segmentation until the mbuf physically exits the machine (where it can be performed in hardware by the physical NIC), or never perform those operations if the mbuf never leaves the physical machine. This is the whole point of virtio-net offloads, and the key to high speed TCP.
Consider the checksum metdatata: mbuf csum_data on tap0 receive is set to 0xFFFF (and CSUM_DATA_VALID is set in csum_flags), but the checksum has not really been computed yet. If the mbuf is delivered locally, everything works fine with the local TCP input processing. However, if the mbuf is forwarded to another member of bridge0 (e.g. tap1), we need to propagate the metadata, and so we should set CSUM_TCP in csum_flags and set csum_data to the checksum offset in the L4 header (e.g. 16 for TCP). But doing so would break the local delivery case. As a matter of facts, this patch improves the TCP throughput between VM vtnet0 and host bridge0 from 3Gbps to 10Gbps, but it breaks TCP/UDP networking between VM vtnet0 and the external network (and other VMs on the same host), because metadata are not correctly propagated and eventually the mbuf is dropped because of invalid checksum. The root problem is that csum_data has different meanings on local transmit and receive, and therefore cannot be reused to propagate metadata.
Note that the problem also affects if_vtnet(4) interfaces in the guest, if you try to bridge two of them through bridge0. TCP forwarding won't work (although local receive and local transmit does).
With enough space in the mbuf to store the metadata, the problem would go away.
For example, in VALE the virtio-net header is prepended to the payload of each packet.
Any opinions on this issue?