Page MenuHomeFreeBSD

netmap: fix lock order reversal related to kqueue usage (v2)
ClosedPublic

Authored by vmaffione on Jan 25 2019, 5:13 PM.

Details

Summary

When using poll(), select() or kevent() on netmap file descriptors, netmap executes the equivalent of NIOCTXSYNC and NIOCRXSYNC commands, before collecting the events that are ready. In other words, the poll/kevent callback has side effects. This is done to avoid the overhead of two system call per iteration (e.g., poll() + ioctl(NIOC*XSYNC)).

When the kqueue subsystem invokes the kqueue(9) f_event callback (netmap_knrw), it holds the lock of the
struct knlist object associated to the netmap port (the lock is provided at initialization, by calling knlist_init_mtx).
However, netmap_knrw() may need to wake up another netmap port (or even the same one), which means that it may need to call knote(). Since knote() needs the lock of the struct knlist object associated to the to-be-wake-up netmap
port, it is possible to have a lock order reversal problem (AB/BA deadlock).

This change prevents the deadlock by executing the knote() call in a per-selinfo taskqueue, where it is possible to hold a mutex.

Test Plan

Test bhyve using VALE with kqueue(2).
See other tests discussed here https://reviews.freebsd.org/D18876

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vmaffione created this revision.Jan 25 2019, 5:13 PM

Looks good. I will try to test this patch tomorrow.

I could not reproduce the bug with my testsuite.

Thanks for testing.
I think this is safe to commit.

sys/dev/netmap/netmap_freebsd.c
1332 ↗(On Diff #53201)

Is it good to use global taskqueue_swi? Maybe we should create a private taskqueue with a dedicated processing thread?

I went for reusing a system-wide taskqueue because the work to be done is minimal (i.e. it is not something like processing a number of descriptors in a device queue, or a list of packets, etc.).
However, if you think it's better to use a netmap-wide taskqueue for this, we can go for it.
Are you worried about something in particular, w.r.t. the current approach?

I'm worried about the increased load on the kernel thread that serves the taskqueue_swi and how it will affect the rest of the system.

PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
2624 root        101    0    54M  5168K CPU5     5   0:29 100.00% pkt-gen{pkt-gen}
2625 root        101    0    54M  5168K CPU2     2   0:28 100.00% pkt-gen{pkt-gen}
2623 root        101    0    54M  5140K CPU15   15   0:30  95.75% pkt-gen{pkt-gen}
2626 root        101    0    54M  5168K CPU1     1   0:28  94.68% pkt-gen{pkt-gen}
2619 root         96    0    51M  2248K CPU12   12   0:24  82.08% netmap_rx
2621 root         96    0    51M  2248K CPU16   16   0:25  79.69% netmap_rx
2620 root         96    0    51M  2248K CPU7     7   0:25  79.39% netmap_rx
2622 root         52    0    51M  2248K CPU9     9   0:25  77.69% netmap_rx
  12 root        -52    -      0   480K WAIT    19   0:12  39.60% intr{swi6: task queue}

Also tasqueue (9) indicated that:

Note that tasks are queued on shared tasks such as taskqueue_swi may be
delayed an indeterminate amount of time before execution. If queuing
delays cannot be tolerated then a private taskqueue should be created
with a dedicated processing thread.

That may not be very good for netmap performance.

That's convincing, thanks. I will try to switch to a per-netmap-adapter private taskqueue, to avoid locking contention on the same taskqueue.

vmaffione updated this revision to Diff 53385.Jan 29 2019, 3:19 PM

knote() notification work deferred to a per-selinfo taskqueue

vmaffione marked an inline comment as done.Jan 29 2019, 3:20 PM

I did not find any problems with the latest patch.

vmaffione edited the summary of this revision. (Show Details)Jan 30 2019, 10:24 AM

Thanks for testing.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 30 2019, 3:52 PM
This revision was automatically updated to reflect the committed changes.