Page MenuHomeFreeBSD

[bhyve] virtio-net: Do not allow receiving packets until features have been negotiated.
ClosedPublic

Authored by afedorov on Nov 26 2020, 12:22 PM.

Details

Summary

This review is a continuation of D22440.

This patch enforces the requirement that the RX callback cannot be called after a reset until the features have been negotiated.

Please take a look:
https://reviews.freebsd.org/D22440
https://svnweb.freebsd.org/base?view=revision&revision=354864
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242023

I found an issue with bhyve + FreeBSD guest:

guest-freebsd# ifconfig vtnet0 down
vtnet: ndesc (3242) out of range, driver confused?
Assertion failed: (n >= 1 && riov_len + n <= VTNET_MAXSEGS), function pci_vtnet_rx, file /afedorov/freebsd-develop/usr.sbin/bhyve/pci_virtio_net.c, line 309.
                                    Abort trap (core dumped)

and

Nov 25 20:25:40 af-12-1 syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 2 1 0 done
Waiting (max 60 seconds) for system thread `bufdaemon' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-1' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-2' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-3' to stop... done
All buffers synced.
Uptime: 1h1m8s
vtnet: ndesc (1347) out of range, driver confused?
Assertion failed: (n >= 1 && riov_len + n <= VTNET_MAXSEGS), function pci_vtnet_rx, file /afedorov/freebsd-develop/usr.sbin/bhyve/pci_virtio_net.c, line 309.
                                    Abort trap (core dumped)
                                                            root@q1u001:/afedorov/vm #

As you can see, bhyve crashes in two cases. When ifconfig vtnet0 down and shutdown -p now are executed from the guest.

The main issue is a race condition where the receive callback is called during device reset.

From the side of bhyve it looks like:

pci_vtnet_reset():
   netbe_rx_disable(sc->vsc_be);
   vi_reset_dev(&sc->vsc_vs):
        vq->vq_last_avail = 0;

pci_vtnet_ping_rxq()
   netbe_rx_enable(sc->vsc_be);

pci_vtnet_rx():
   n = vq_getchain():
        idx = vq->vq_last_avail; /* Equal zero!!! */
        ndesc = (uint16_t)((u_int)vq->vq_avail->va_idx - idx);
        if (ndesc > vq->vq_qsize) return (-1)
   assert(n >= 1 && riov_len + n <= VTNET_MAXSEGS);

In revision 354864, we introduced turning off RX on device reset. But this is not enough, since after pci_vtnet_reset () the guest can call pci_vtnet_ping_rxq () which re-enables RX.

I have not been able to reproduce this situation with a Linux guest. So it might be a bug in the FreeBSD guest driver. But since we already have several releases with this driver (11.4, 12.2, pfSense etc), I think it would be nice to fix it in bhyve.

Test Plan

To reproduce the crash:

# sh vmrun.sh -c 6 -m 8G -t vale0:0 -d freebsd-12-2.img af-12-2

From another terminal:

# pkt-gen -i vale0:1 -f tx

In the guest system, run the following script:

#!/bin/sh

while true
do
        ifconfig vtnet0 up
        ifconfig vtnet0 down
done

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

@vmaffione, what do you think about this patch?

usr.sbin/bhyve/pci_virtio_net.c
114 ↗(On Diff #80018)

features_negotiated is less verbose.

254 ↗(On Diff #80018)

Would it be better to simply return here, and re-enable the rx backend when feature negotation is complete ?

769 ↗(On Diff #80018)

i.e. re-enable rx here and not in pci_vtnet_rx()

I agree with @grehan , but see my comments.

usr.sbin/bhyve/pci_virtio_net.c
114 ↗(On Diff #80018)

+1

254 ↗(On Diff #80018)

Am I wrong or your comment was meant to be for the hunk below (pci_vtnet_ping_rxq)?
In that case I agree with you.

423 ↗(On Diff #80018)

As @grehan suggests, we can return here if features are not negotiated.

usr.sbin/bhyve/pci_virtio_net.c
254 ↗(On Diff #80018)

Both. Any reason why you'd want to disable the backend here if feature aren't yet negotiated, in addition to pci_vtnet_reset() ?

Eliminate issues identified by reviewers.

usr.sbin/bhyve/pci_virtio_net.c
769 ↗(On Diff #80018)

If I understand everything correctly. We should not enable RX before the guest system kicks the RX queue.

usr.sbin/bhyve/pci_virtio_net.c
769 ↗(On Diff #80018)

You're right, my mistake: that code block can be resurrected.

vmaffione added inline comments.
usr.sbin/bhyve/pci_virtio_net.c
254 ↗(On Diff #80018)

In theory sc->features_negotiated should always be true here, otherwise it's another bug.

This revision is now accepted and ready to land.Nov 29 2020, 7:53 PM
usr.sbin/bhyve/pci_virtio_net.c
254 ↗(On Diff #80018)

@vmaffione , you are right. I found another way that can lead to the crash of the FreeBSD guest on shutdown:
pci_vtnet_reset() -> netbe_rx_disable() ->mevent_disable() -> pci_vtnet_rx() -> crash

After mevent_disable (), an unexpected event may occur, because mevent_build () will be called after processing the current event:
mevent_disable() -> mevent_dispatch() -> kevent() -> mevent_handle() -> And, only after that an event really disabled -> mevent_build()

So, pci_vtnet_rx() may be called after pci_vtnet_reset().

This check prevents the guest from falling. But I think we need to fix this. For example like this:

Index: usr.sbin/bhyve/mevent.c
===================================================================
--- usr.sbin/bhyve/mevent.c     (revision 368141)
+++ usr.sbin/bhyve/mevent.c     (working copy)
@@ -225,6 +225,10 @@
        for (i = 0; i < numev; i++) {
                mevp = kev[i].udata;
 
+               /* Skip disabled events */
+               if (mevp->me_state & EV_DELETE)
+                       continue;
+
                /* XXX check for EV_ERROR ? */
 
                (*mevp->me_func)(mevp->me_fd, mevp->me_type, mevp->me_param);

I'd say let's go step by step.
First we should merge this patch, which makes sense as is (unless @grehan has some more observations)?
Then we can talk about the other issue in a separate review.