Page MenuHomeFreeBSD

tap: add support for virtio-net offloads
ClosedPublic

Authored by vmaffione on Aug 14 2019, 2:32 PM.

Details

Summary

This patch is part of an effort to make bhyve networking (in particular TCP) faster.
The key strategy to enhance TCP throughput is to let the whole packet datapath work with TSO/LRO packets (up to 64KB each), so that the per-packet overhead is amortized over a large number of bytes.
This capability is supported in the guest by means of the vtnet(4) driver, which is able to handle TSO/LRO packets leveraging the virtio-net header (see struct virtio_net_hdr and struct virtio_net_hdr_mrg_rxbuf).
A bhyve VM exchanges packets with the host through a network backend, which can be vale(4) or if_tap(4).
While vale(4) supports TSO/LRO packets, if_tap(4) does not.
This patch extends if_tap(4) with the ability to understand the virtio-net header, so that a tapX interface can process TSO/LRO packets.
A couple of ioctl commands have been added to configure and probe the virtio-net header. Once the virtio-net header is set,
the tapX interface acquires all the IFCAP capabilities necessary for TSO/LRO.

Related patches:
https://reviews.freebsd.org/D20973
https://reviews.freebsd.org/D20987
https://reviews.freebsd.org/D21007

Test Plan

Tested with GENERIC (thus DEBUG enabled), with a bhyve VM connected to a tap0 device configured with an IP address.
The tap0 device has not been bridged to a bridge(4) device.
Netperf tests between the vtnet0 interface in the VM and the tap0 shows 9-10 Gbps in both directions.
Without TSO/LRO support, the same tests show ~3Gbps.

In order to test this patch, the related patches an additional follow-up patch is needed for bhyve to configure the virtio-net header in the if_tap(4) backend.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vmaffione created this revision.Aug 14 2019, 2:32 PM
vmaffione edited the summary of this revision. (Show Details)Aug 14 2019, 6:32 PM
vmaffione edited the test plan for this revision. (Show Details)
vmaffione added reviewers: kevans, jhb.

Thank you for making this happen! I'm glad that you managed to reuse some code from ptnet. Is taking advantage of these pieces in vtnet on your radar as well?

sys/net/if_tuntap.c
1320 ↗(On Diff #60781)

This section can be simplified with some slight reorganization:

if (tp->tun_vhdrlen != ival) {
        if (tp->tun_vhdrlen == 0)
                ifp->if_capabilities |=
                    TAP_VNET_HDR_CAPS;
        else if (ival == 0)
                ifp->if_capabilities &=
                    ~TAP_VNET_HDR_CAPS;
        tp->tun_vhdrlen = ival;
        ifp->if_capenable = ifp->if_capabilities;
        ifp->if_hwassist = 0;
        if (ifp->if_capenable & IFCAP_TXCSUM)
                ifp->if_hwassist |= CSUM_TCP | CSUM_UDP;
        if (ifp->if_capenable & IFCAP_TXCSUM_IPV6)
                ifp->if_hwassist |= CSUM_TCP_IPV6
                    | CSUM_UDP_IPV6;
        if (ifp->if_capenable & IFCAP_TSO4)
                ifp->if_hwassist |= CSUM_IP_TSO;
        if (ifp->if_capenable & IFCAP_TSO6)
                ifp->if_hwassist |= CSUM_IP6_TSO;
}
1499 ↗(On Diff #60781)

Inverting this check would further simplify things here:

for (;;) {
        IFQ_DEQUEUE(&ifp->if_snd, m);
        if (m != NULL)
                break;
        if (flag & O_NONBLOCK) {
                TUN_UNLOCK(tp);
                return (EWOULDBLOCK);
        }
        tp->tun_flags |= TUN_RWAIT;
        error = mtx_sleep(tp, &tp->tun_mtx, PCATCH | (PZERO + 1),
            "tunread", 0);
        if (error != 0) {
                TUN_UNLOCK(tp);
                return (error);
        }
}
1650 ↗(On Diff #60781)

error would be more consistent with the rest of the file.

1679 ↗(On Diff #60781)

if (error != 0) would be more consistent with the dominant convention used in this file.

vmaffione updated this revision to Diff 60907.Aug 16 2019, 7:37 PM

Addressed suggestions.

markj added inline comments.Aug 16 2019, 7:38 PM
sys/net/if_tuntap.c
1251 ↗(On Diff #60781)

These two lines can be merged now.

1512 ↗(On Diff #60781)

Why this change?

1519 ↗(On Diff #60781)

min() will truncate resid to 32 bits.

vmaffione marked 4 inline comments as done.Aug 16 2019, 7:45 PM

The inline functions added to virtio_net.h were simply copy-pasted from if_vtnet.c to if_ptnet.c in the first place. And yes, I plan to make also if_vtnet.c use the same code in future. I haven't done it yet because if_vtnet.c increments device specific statistic counters.

In any case this patch needs more work to support capabilities enable/disable (e.g. ifconfig tap0 -tso).
Also, it's probably not a good idea to update if_capenable on TAPSVNETHDR, as the current patch does right now, and let the user program do that explicitly.
I'd like to hear the maintainers' opinion here.

sys/net/if_tuntap.c
1320 ↗(On Diff #60781)

Yes, indeed, thanks.

1499 ↗(On Diff #60781)

I know, but I did not want to drop more unrelated changes. Maybe I should just submit those with a separate patch.

In any case this patch needs more work to support capabilities enable/disable (e.g. ifconfig tap0 -tso).
Also, it's probably not a good idea to update if_capenable on TAPSVNETHDR, as the current patch does right now, and let the user program do that explicitly.
I'd like to hear the maintainers' opinion here.

I think I'm personally a solid +0 on the current setup from a tap(4) perspective. The current behavior (with the yet-added SIOCSIFCAP handling) seems to roughly enough match ptnet behavior (i.e. the additional capabilities are enabled by default with PTNETMAP_F_VNET_HDR, assuming I can read this properly) and consistency is good.

On a side note, is there a particular reason that the ptnet(4) manpage is not yet connected to the build, or was this just a small oversight?

sys/net/if_tuntap.c
1499 ↗(On Diff #60781)

Please do. =)

vmaffione updated this revision to Diff 60918.Aug 16 2019, 10:08 PM
vmaffione marked 2 inline comments as done.

Added SIOCSIFCAP support. Addressed more comments.

vmaffione marked 4 inline comments as done.EditedAug 16 2019, 10:14 PM

The ptnet(4) case is a bit different, since the driver knows at attach time which if_capabilities are supported, and those are not going to change.
On the other hand, with these changes to tap(4) the user can change if_capabilities dynamically.

The ptnet(4) man page issue is due to my lack of experience with the build system, sorry for that. I will try to fix it.

sys/net/if_tuntap.c
1499 ↗(On Diff #60781)

I will.

1512 ↗(On Diff #60781)

Unrelated change, removed.

1519 ↗(On Diff #60781)

I've changed len to be size_t... should this do the trick?

vmaffione marked 3 inline comments as done.Aug 16 2019, 10:15 PM
vmaffione updated this revision to Diff 60965.Aug 18 2019, 10:45 AM

Minor changes.

kevans added a comment.Sat, Oct 5, 5:14 PM

I've created some more minor merge conflicts for you in the interim (sorry about that), but they should be trivial to resolve. I have some larger tun/tap changes that may produce more annoying conflicts that I'll shelve until we get this in.

sys/net/if_tuntap.c
807 ↗(On Diff #60965)

Redundant -- the softc will always be allocated w/ M_ZERO. You should reset it in tunclose, though, around when we reset the tun_pid.

1072 ↗(On Diff #60965)

This multi-line comment should be reformatted to move beginning/end markers to newlines.

1079 ↗(On Diff #60965)

I've added TUN_LOCK_ASSERT in the interim -- do consider passing in the tuntap_softc and asserting. =-)

1327 ↗(On Diff #60965)

This multi-line comment and the one a couple lines later should be reformatted to move start/end markers to newlines.

vmaffione updated this revision to Diff 63047.Tue, Oct 8, 6:02 PM
vmaffione marked 4 inline comments as done.

Addressed comments from kevans:

  • style fixes
  • reset vhdrlen to 0 when PID changes; this required the introduction of a new helper function (tun_vnethdr_set)
vmaffione added inline comments.Tue, Oct 8, 6:03 PM
sys/net/if_tuntap.c
807 ↗(On Diff #60965)

Thanks! I completely missed the pid update thing.

kevans added inline comments.Tue, Oct 8, 6:08 PM
sys/net/if_tuntap.c
807 ↗(On Diff #60965)

Sorry, I actually just meant the instance in tunclose where we reset the pid to 0 -- TUNSIFPID historically doesn't mean a whole lot, besides one process opened it then handed it off to a worker process. I don't know if that should necessarily imply a vnethdr reset or not, but I'm also not that familiar with virtio stuff.

vmaffione updated this revision to Diff 63051.Tue, Oct 8, 8:00 PM

Don't call tun_vnethdr_set() on pid change.

vmaffione marked an inline comment as done.Tue, Oct 8, 8:04 PM
vmaffione added inline comments.
sys/net/if_tuntap.c
807 ↗(On Diff #60965)

No, in case of handoff to a worker process it does not make sense to remove the virtio-net header. The worker process will just deal with the tap configuration set by the other process. I removed the call on TUNSIFPID.

kevans accepted this revision.Wed, Oct 16, 1:33 AM

I think I'm happy with this. I did wonder if we should reset capabilities on close now that we support SIOCSIFCAP, but we don't reset mtu or other state manipulated by ioctls (TUNSLMODE/TUNSIFHEAD), so I guess that's a "no."

This revision is now accepted and ready to land.Wed, Oct 16, 1:33 AM
vmaffione marked an inline comment as done.EditedFri, Oct 18, 9:43 PM

Thanks. Yes, I don't see either any reason why we want to reset device configuration (vnet hdr, mtu, etc.) on close.

Thanks. Yes, I don't see either any reason why we want to reset device configuration (vnet hdr, mtu, etc.) on close.

I'm more curious why we don't want to reset it all on close -- once it's closed, it can be reused by any other software that would then have to assume the tun/tap device can be in some indeterminate state with respect to parameters it may not care about at all (e.g. vnethdr, TUNSIFHEAD) that can affect its usage.

This revision was automatically updated to reflect the committed changes.

Il giorno ven 18 ott 2019 alle ore 23:47 kevans (Kyle Evans) <
phabric-noreply@freebsd.org> ha scritto:

kevans added a comment.

In D21263#482518 <https://reviews.freebsd.org/D21263#482518>,

@vmaffione wrote:

> Thanks. Yes, I don't see either any reason why we want to reset device

configuration (vnet hdr, mtu, etc.) on close.

I'm more curious why we don't want to reset it all on close -- once it's

closed, it can be reused by any other software that would then have to
assume the tun/tap device can be in some indeterminate state with respect
to parameters it may not care about at all (e.g. vnethdr, TUNSIFHEAD) that
can affect its usage.

That's right, and if we want to be conservative we should indeed reset the
interface configuration to a "standard state" at tunclose(). The only
potential problem I see is that then it would not be possible to have a
process to configure the tap interface and a separate ones (e.g. maybe
children processes) read/write frames from/to the device. But maybe it's
not something useful, as processes can just pass a tap file descriptor
across fork() and unix sockets.
Btw, as far as I understand file descriptors for /dev/tap are legacy, so
tunclose() may disappear?
In any case being conservative is probably the best idea, so we should
implement this kind of reset. Let me know if you want me to do that.

CHANGES SINCE LAST ACTION

https://reviews.freebsd.org/D21263/new/

REVISION DETAIL

https://reviews.freebsd.org/D21263

EMAIL PREFERENCES

https://reviews.freebsd.org/settings/panel/emailpreferences/

To: vmaffione, kevans, jhb, markj
Cc: ryan_freqlabs.com, aleksandr.fedorov_itglobal.com, marius, markj,
bryanv, imp, ae, gnn, krzysztof.galazka_intel.com

Il giorno ven 18 ott 2019 alle ore 23:47 kevans (Kyle Evans) <
phabric-noreply@freebsd.org> ha scritto:

kevans added a comment.

In D21263#482518 <https://reviews.freebsd.org/D21263#482518>,

@vmaffione wrote:

> Thanks. Yes, I don't see either any reason why we want to reset device

configuration (vnet hdr, mtu, etc.) on close.

I'm more curious why we don't want to reset it all on close -- once it's

closed, it can be reused by any other software that would then have to
assume the tun/tap device can be in some indeterminate state with respect
to parameters it may not care about at all (e.g. vnethdr, TUNSIFHEAD) that
can affect its usage.

That's right, and if we want to be conservative we should indeed reset the
interface configuration to a "standard state" at tunclose(). The only
potential problem I see is that then it would not be possible to have a
process to configure the tap interface and a separate ones (e.g. maybe
children processes) read/write frames from/to the device. But maybe it's
not something useful, as processes can just pass a tap file descriptor
across fork() and unix sockets.

Right, so the proper procedure here is to configure it then transfer it (fork or SCM_RIGHTS).

Btw, as far as I understand file descriptors for /dev/tap are legacy, so
tunclose() may disappear?

tunclose() is actually only ever called for real /dev/tapNN -- /dev/tap doesn't actually exist, our dev_clone listener hijacks lookups for it and creates a tapNN that's not yet allocated. It will be going away in the future, but in favor of a cdevpriv dtor that does the same thing. :-)

In any case being conservative is probably the best idea, so we should
implement this kind of reset. Let me know if you want me to do that.

Nah- what you've committed is good for your part. Thanks!