Page MenuHomeFreeBSD

taskqueue_drain_all(): do not deadlock
AbandonedPublic

Authored by kib on Aug 30 2024, 3:30 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Nov 21, 4:55 PM
Unknown Object (File)
Thu, Nov 21, 6:47 AM
Unknown Object (File)
Tue, Nov 19, 8:47 AM
Unknown Object (File)
Sun, Nov 17, 12:52 PM
Unknown Object (File)
Thu, Oct 31, 7:26 AM
Unknown Object (File)
Oct 7 2024, 1:28 AM
Unknown Object (File)
Sep 23 2024, 12:20 AM
Unknown Object (File)
Sep 18 2024, 11:52 AM
Subscribers

Details

Reviewers
markj
jhb
Summary

if called from the taskqueue thread context, and taskqueue is single-threaded.

It is correct because no code other than the taskqueues functions should be ever executed in this context.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

sys/kern/subr_taskqueue.c
632

I don't think this is sufficient. Suppose an unrelated task is enqueued after the current task. We'll still block forever, no?

sys/kern/subr_taskqueue.c
632

Could you please explain the situation in more details, since I do not follow.

The patch makes drain_all() to not sleep at all, if called from a task context. Then, whatever is scheduled after the task ('after' makes sense because taskqueue is single-threaded) is not an issue, we cannot care about it.

sys/kern/subr_taskqueue.c
632

Oh I see, I misunderstood the patch.

But this change defeats my original patch: key_vnet_destroy() may trigger some queuing of work from ipsec_offload code. We need to wait for that to finish somehow.

sys/kern/subr_taskqueue.c
632

I stil think that this change is the right fix, regardless of the original issue.

sys/kern/subr_taskqueue.c
632

IMO it is too magical. It would be better to simply panic if a taskqueue thread tries to drain itself. It is always a bug, I believe.

sys/kern/subr_taskqueue.c
632

Except when relying on single-threaded taskqueues, for instance. I do not agree that this is a bug.

I think trying to drain on the taskqueue from "within" the taskqueue context is a logic bug. Stopping a task from a task handler seems fine, but draining seems like something that you should do "outside". This is already true for some other primitives (callout_drain, bus_teardown_intr) though in those cases you can't sleep in the handler context anyway. I would rather have a KASSERT that you can't drain from within a taskqueue handler to catch the logic bug explicitly.

It is not drain as such, but more a fence splitting before and after events, IMO. From this PoV, it is fine to call in any context, and sometimes it is hard to predict would it occur in the prohibited one. So I do not agree, but since the change is not needed right now, I abandon it.

In D46489#1059748, @kib wrote:

It is not drain as such, but more a fence splitting before and after events, IMO.

But it also blocks until all tasks queued before the fence are executed. So if a task 1) queues additional tasks on the same queue, 2) calls taskqueue_drain_all() from the same task context, it will deadlock forever.

From this PoV, it is fine to call in any context, and sometimes it is hard to predict would it occur in the prohibited one.

IMO, the root of the problem problem is that the taskqueue KPI is too low-level for usages like the one in ipsec_accel. To avoid creating too many special-purpose threads, many consumers all share a single queue, and this queue is serviced by dedicated threads, and interactions between consumers (ipsec_accel and jail/vnet, in this case) can cause deadlocks. I think a better design would be to push knowledge of the threading model into the taskqueue internals, let consumers define their own separate queues without specifying how they are mapped to threads, and let the taskqueue implementation decide how the queues are scheduled. Then, if a vnet task needs to drain the ipsec_offload queue, the taskqueue layer can safely dispatch a different thread (from a dynamically sized pool) to handle it.

There should be no need for taskqueue to create a dedicated thread for ipsec_offload, it just needs to maintain a queue. More generally, we have way too many kernel threads because the taskqueue interface forces it. ZFS creates an absurd number of threads on its own for this reason.

So I do not agree, but since the change is not needed right now, I abandon it.