This review is a continuation of D22440.
This patch enforces the requirement that the RX callback cannot be called after a reset until the features have been negotiated.
Please take a look:
https://reviews.freebsd.org/D22440
https://svnweb.freebsd.org/base?view=revision&revision=354864
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242023
I found an issue with bhyve + FreeBSD guest:
guest-freebsd# ifconfig vtnet0 down vtnet: ndesc (3242) out of range, driver confused? Assertion failed: (n >= 1 && riov_len + n <= VTNET_MAXSEGS), function pci_vtnet_rx, file /afedorov/freebsd-develop/usr.sbin/bhyve/pci_virtio_net.c, line 309. Abort trap (core dumped)
and
Nov 25 20:25:40 af-12-1 syslogd: exiting on signal 15 Waiting (max 60 seconds) for system process `vnlru' to stop... done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining... 2 1 0 done Waiting (max 60 seconds) for system thread `bufdaemon' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-1' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-2' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-3' to stop... done All buffers synced. Uptime: 1h1m8s vtnet: ndesc (1347) out of range, driver confused? Assertion failed: (n >= 1 && riov_len + n <= VTNET_MAXSEGS), function pci_vtnet_rx, file /afedorov/freebsd-develop/usr.sbin/bhyve/pci_virtio_net.c, line 309. Abort trap (core dumped) root@q1u001:/afedorov/vm #
As you can see, bhyve crashes in two cases. When ifconfig vtnet0 down and shutdown -p now are executed from the guest.
The main issue is a race condition where the receive callback is called during device reset.
From the side of bhyve it looks like:
pci_vtnet_reset(): netbe_rx_disable(sc->vsc_be); vi_reset_dev(&sc->vsc_vs): vq->vq_last_avail = 0; pci_vtnet_ping_rxq() netbe_rx_enable(sc->vsc_be); pci_vtnet_rx(): n = vq_getchain(): idx = vq->vq_last_avail; /* Equal zero!!! */ ndesc = (uint16_t)((u_int)vq->vq_avail->va_idx - idx); if (ndesc > vq->vq_qsize) return (-1) assert(n >= 1 && riov_len + n <= VTNET_MAXSEGS);
In revision 354864, we introduced turning off RX on device reset. But this is not enough, since after pci_vtnet_reset () the guest can call pci_vtnet_ping_rxq () which re-enables RX.
I have not been able to reproduce this situation with a Linux guest. So it might be a bug in the FreeBSD guest driver. But since we already have several releases with this driver (11.4, 12.2, pfSense etc), I think it would be nice to fix it in bhyve.