Page MenuHomeFreeBSD

vtnet: fix netmap support
ClosedPublic

Authored by vmaffione on Nov 9 2018, 9:07 AM.
Tags
None
Referenced Files
F80107602: D17916.diff
Thu, Mar 28, 12:52 AM
Unknown Object (File)
Thu, Mar 7, 2:32 AM
Unknown Object (File)
Feb 8 2024, 8:22 PM
Unknown Object (File)
Jan 11 2024, 9:51 PM
Unknown Object (File)
Dec 27 2023, 11:31 PM
Unknown Object (File)
Dec 22 2023, 11:22 PM
Unknown Object (File)
Dec 3 2023, 7:38 PM
Unknown Object (File)
Nov 28 2023, 12:52 AM
Subscribers

Details

Summary

netmap(4) support for vtnet(4) was incomplete and had multiple bugs.
This commit fixes those bugs to bring netmap on vtnet in a functional state.

Changelist:

  • handle errors returned by virtqueue_enqueue() properly (they were previously ignored)
  • make sure netmap XOR rest of the kernel access each virtqueue.
  • compute the number of netmap slots for TX and RX separately, according to whether indirect descriptors are used or not for a given virtqueue.
  • make sure sglist are freed according to their type (mbufs or netmap buffers)
  • add support for mulitiqueue and netmap host (aka sw) rings.
  • intercept VQ interrupts directly instead of intercepting them in txq_eof and rxq_eof. This simplifies the code and makes it easier to make sure taskqueues are not running for a VQ while it is in netmap mode.
  • implement vntet_netmap_config() to cope with changes in the number of queues.
Test Plan

I used a FreeBSD VM with 4 vCPUs and 2GiB of memory, running on a Linux host with the QEMU/KVM hypervisor. My Linux host is an 3rd gen I7. I use the "vhost" in-kernel accelerator (a Linux technology) to speed up virtio processing in the host.

Test 1

In the VM, I run pkt-gen in tx mode on the vtnet0 interface

  1. pkt-gen -i vtnet0 -f tx

In the host, I run pkt-gen in rx mode on the TAP interface that is the backend of the VM vtnet0. I measure about 1.5 Mpps.

  1. pkt-gen -i netmap:tap1_60 -f rx

Test 2

In the VM, I run pkt-gen in rx mode on the vtnet0 interface

  1. pkt-gen -i vtnet0 -f rx

In the host, I run pkt-gen in tx mode on the TAP interface that is the backend of the VM vtnet0.

  1. pkt-gen -i netmap:tap1_60 -f tx

I measure about 2.1 Mpps.

Test 3

This test is about vtnet0 host rings.
In the VM, I disable all the offloadings (as required by netmap) and give an address:

  1. ifconfig vtnet0 up 10.0.0.1/24 -txcsum -rxcsum -tso4 -tso6 -lro -txcsum6 -rxcsum6

On the host, I give an address (10.0.0.100/24) to the "br01" interface of the in-kernel bridge where the TAP interface of the VM is attached (so that I can ping 10.0.0.1 from the host).
Then in the VM I start the netmap "bridge" program so that it forwards packets between the vtnet0 TX/RX (hardware) rings and the TX/RX host rings:

  1. bridge -w0 -i netmap:vtnet0

After I do this, I'm still able to ping 10.0.0.1 from the host, which means that host rings are working.
Finally, to check that TCP is working across the host rings (e.g. checksums are computed correctly, interrupts are delivered etc.), I try netcat between the host and the VM. E.g., in the VM:

  1. nc -l 8000 # listen on port 8000

And in the host:

  1. nc 10.0.0.1 8000

and I observe that I'm able to chat between the host and the VM.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Add missing update to if_vtnetvar.h

gnn requested changes to this revision.Nov 10 2018, 7:49 AM

Please update the onoff variable to match how we normally express this in FreeBSD. Other than that small nit this looks fine.

sys/dev/netmap/if_vtnet_netmap.h
88 ↗(On Diff #50194)

I'd rename onoff to state as that's a more common name for such a boolean in our kernel.

This revision now requires changes to proceed.Nov 10 2018, 7:49 AM

Renamed "onoff" --> "state", as suggested by reviewers.

This comment was removed by vmaffione.

Thanks. Please commit with the standard Approved by: line.

This revision is now accepted and ready to land.Nov 10 2018, 3:55 PM

Can you update this diff with the full context? Thanks.

sys/dev/netmap/if_vtnet_netmap.h
224 ↗(On Diff #50254)

The use of a shared header here (and more so on the receive side) worries me as rather brittle. Is there some existing vtnet netmap code that un-negotiates most of the offload features?

sys/dev/virtio/network/if_vtnetvar.h
82 ↗(On Diff #50254)

Can these be wrapped in an #ifdef DEV_NETMAP?

I'm sorry, how can I generate full diff context with SVN? I cannot find the right command...

sys/dev/netmap/if_vtnet_netmap.h
224 ↗(On Diff #50254)

No, renegotiation happens when the netmap application runs something like

  1. ifconfig vtnet0 -txcsum -rxcsum -tso4 -tso6 -lro -txcsum6 -rxcsum6

which is something that netmap applications normally do.

The point here is that netmap does not support offloadings. It simply ignores and strips the vtnet header (on receive), and prepends a zeroed vtnet head on transmission (for compatibility with the hypervisor). So having a shared header is important to avoid dynamic allocation or wasting space with a parallel array.
Using a different shared header for each TX/RX queue removes cache ping pong effects in case the backend processes each virtqueue using a different thread (e.g. this is the case for vhost-net on linux, or OVS-DPDK, etc.), or in case the netmap application processes each corresponding netmap ring using a different thread.

vmaffione marked an inline comment as done.

Added DEV_NETMAP in if_vtnetvar.h
Provided full diff context.

This revision now requires review to proceed.Nov 11 2018, 10:30 AM
sys/dev/netmap/if_vtnet_netmap.h
299 ↗(On Diff #50281)

Either check or assert the value of err?

224 ↗(On Diff #50254)

Is there any code that disables the mergable receive buffers feature? With mergable buffers, vtnet_netmap_kring_refill() does not quite look correct because the header does not get a separate descriptor.

vmaffione marked 3 inline comments as done.

Added assertions for sglist_append().
Fixed indentation.

sys/dev/netmap/if_vtnet_netmap.h
224 ↗(On Diff #50254)

No, currently we do not disable mergeable receive buffers, but we could do that on the reinit cycle triggered by vtnet_netmap_reg().

I do not understand your observation about the separate descriptor in case of mergeable buffers, though.
First, the header does get a separate virtio descriptor, as you can see at line 299. The second descriptor (line 300) in the chain points to the netmap buffer.
Second, why would a separate descriptor be necessary? On the contrary, with mergeable buffers the guest does not know how a certain buffer passed to virtqueue_enqueue() will be used for a received packet: it may be used for the beginning of the packet, it might be in the middle, or at the end. And indeed when mergeable buffers are used, sc->vtnet_rx_nsegs==1, i.e., the driver publishes buffers one at a time, with no virtio descriptors "dedicated" to the virtio-net header.

sys/dev/netmap/if_vtnet_netmap.h
224 ↗(On Diff #50254)

With the mergable buffer feature, the host device can spread a received packet across multiple descriptors, storing the number of descriptors in the header's num_buffers. I see this being broken when sharing the header.

You might not hit this because either your host device does not support mergable buffers, or since LRO is disabled your receive buffers are always large enough so that num_buffers is always one.

vmaffione added inline comments.
sys/dev/netmap/if_vtnet_netmap.h
224 ↗(On Diff #50254)

Absolutely. This was not under discussion. As I said, we can disable mergeable buffers in vtnet_netmap_reg(), and I will do that.
I was commenting on the need for a virtio descriptor dedicated to the virtio-net header.

Btw, my host supports mergeable buffers, but large buffers never come because in the guest I disable txcsum/rxcsum, which implies no tso, and therefore no large frames.

sys/dev/netmap/if_vtnet_netmap.h
224 ↗(On Diff #50254)

The support of mergeable buffers is determined at attach time and various pieces of the driver are configured/sized accordingly and not intended to be changed on the fly at reinit time. I think at the point of your netmap register call is too late (and would impact non-netmap users).

Ultimately I feel you're going to have to implement full receive feature support in netmap, or at least fail the netmap register when unsupported features were negotiated. Otherwise, you are risking corruption or other hard to find bugs. Also, I have support for VirtIO v1 [1] that I hope to commit soon which further complicates the Rx/Tx contract when negotiated.

I'm fine if you want to commit this since it does seem to be an improvement but I think additional work is eventually needed. After VirtIO V1 support is committed I would really like to see if we can refactor all of this so we don't have to support two Rx/Tx path implementations.

[1] - https://github.com/bryanv/freebsd/blob/virtio/sys/dev/virtio/network/if_vtnet.c

sys/dev/virtio/network/if_vtnet.c
1883 ↗(On Diff #50304)

more is not initialized at this point, and then assigned before being used later on, so is it needed here?

2531 ↗(On Diff #50304)

To generally match the style used elsewhere in this file, can you add a != 0 here?

vmaffione added inline comments.
sys/dev/netmap/if_vtnet_netmap.h
224 ↗(On Diff #50254)

Yes I've noticed the pieces sized accordingly.
The netmap register is the equivalent of the IFF_UP and IFF_DOWN routines for netmap interfaces, so they can be called multiple times. It's not an "attach", so you are right.
Support for offloads/features is something that is under discussion, but it must be designed in the netmap core first, before modifying drivers.

So for now I would commit this, which is already a big improvement and does not affect non-netmap users.

sys/dev/virtio/network/if_vtnet.c
1883 ↗(On Diff #50304)

The value returned in the third argument is used on Linux, and not used on FreeBSD. Something not-NULL must be provided to tell netmap that this is an RX interrupt (if NULL, it means it is a TX interrupt).
IOW, this is harmless.

2531 ↗(On Diff #50304)

Fixed here and also for the netmap_rx_irq call.

This revision is now accepted and ready to land.Nov 14 2018, 2:58 PM
This revision was automatically updated to reflect the committed changes.