Page MenuHomeFreeBSD

netmap: fix lock order reversal related to kqueue usage
AbandonedPublic

Authored by vmaffione on Jan 25 2019, 9:32 AM.
Tags
None
Referenced Files
Unknown Object (File)
Oct 3 2024, 5:52 AM
Unknown Object (File)
Oct 1 2024, 11:08 AM
Unknown Object (File)
Sep 27 2024, 9:30 PM
Unknown Object (File)
Sep 27 2024, 9:05 PM
Unknown Object (File)
Sep 27 2024, 4:54 PM
Unknown Object (File)
Sep 13 2024, 10:10 PM
Unknown Object (File)
Aug 18 2024, 11:27 AM
Unknown Object (File)
Aug 14 2024, 8:34 AM

Details

Reviewers
afedorov
Summary

When using poll(), select() or kevent() on netmap file descriptors, netmap executes the equivalent of NIOCTXSYNC and NIOCRXSYNC commands, before collecting the events that are ready. In other words, the poll/kevent callback has side effects. This is done to avoid the overhead of two system call per iteration (e.g., poll() + ioctl(NIOC*XSYNC)).

When the kqueue subsystem invokes the kqueue(9) f_event callback (netmap_knrw), it holds the lock of the
struct knlist object associated to the netmap port (the lock is provided at initialization, by calling knlist_init_mtx).
However, netmap_knrw() may need to wake up another netmap port (or even the same one), which means that it may need to call knote(). Since knote() needs the lock of the struct knlist object associated to the to-be-wake-up netmap
port, it is possible to have a lock order reversal problem (AB/BA deadlock).

This change prevents the deadlock by introducing a global lock for the struct knlist object of all the netmap ports.

Test Plan

Test bhyve using VALE with kqueue(2).
See other tests discussed here https://reviews.freebsd.org/D18876

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

I conducted the following test. Simultaneous start:

  • 8 (RX) netmap_test processes (D18876) on different vale (0 - 7) ports.
  • 8 (TX) pkt-gen processes on different vale (0 - 7) ports

And there was no deadlock or warning message from witness (4).

As for reproducibility of the tests, the deadlock is easily reproduced on the Xeon E5-2630 v4 with a non-patched kernel, but it is very difficult on a weak hardware like the AMD A6 APU.

Thanks for testing. So this seems to work.
However, introducing a global lock is not the best approach for performance reasons. I tried to came up with a different approach that also allows us to get rid of the mtx_owned() call.
See https://reviews.freebsd.org/D18956