Page MenuHomeFreeBSD

cam: Run all XPT_ASYNC ccbs in a dedicated thread
ClosedPublic

Authored by imp on Mar 10 2021, 11:06 PM.
Tags
None
Referenced Files
F80139853: D29210.id85581.diff
Thu, Mar 28, 11:35 AM
Unknown Object (File)
Mon, Mar 18, 9:51 PM
Unknown Object (File)
Feb 21 2024, 7:31 PM
Unknown Object (File)
Feb 21 2024, 7:31 PM
Unknown Object (File)
Jan 27 2024, 9:27 AM
Unknown Object (File)
Jan 14 2024, 9:22 AM
Unknown Object (File)
Dec 23 2023, 1:31 AM
Unknown Object (File)
Dec 17 2023, 9:35 AM
Subscribers

Details

Summary

Queue all XPT_ASYNC ccb's and run those in a new cam async thread. This thread
is allowed to have bouneded sleeps. This should allow us to make all the
registration routines for cam periph driverrs simpler since they can assume they
can always allocate memory. This is a separate thread so that any I/O that's
completed in xpt_done_td isn't held up.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

imp requested review of this revision.Mar 10 2021, 11:06 PM
sys/cam/nvme/nvme_xpt.c
765 ↗(On Diff #85533)

Oh, this is wrong, or at least not relevant for this patch.

LGTM, for whatever little that's worth.

This revision is now accepted and ready to land.Mar 10 2021, 11:28 PM

My "build machine" runs a plain GENERIC kernel, so when it's running head, the INVARIANTS option is enabled. Its normal daily update got to main-n245338-221622ec0c8e, but after updating sources to main-n245363-b3dac3913dc9, it panicked (after successful build/install/reboot) before entering multi-user mode. Likewise after updating to main-n245372-d1cbe7908986.

I applied the diff from this review to the (clean) sources at main-n245372-d1cbe7908986; rebuilt/installed; the following reboot did make it to multi-user mode, and now reports:

freebeast(14.0-C)[3] uname -aUK
FreeBSD freebeast.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #1209 main-n245372-d1cbe7908986-dirty: Wed Mar 10 15:23:53 PST 2021     root@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC  amd64 1400005 1400005

which seems rather like a "win" to me.

sys/cam/cam_xpt.c
3169

The wakeup() should be done before you unlock the lock, otherwise the worker thread can miss the wakeup. (xpt_done() has the same problem.)

sys/cam/cam_xpt.c
3169

I suspect that if we move that, we can get rid of the run interlock too... At best it saves extra wakeups in the rare case where races are lost if we move this... I'll do a separate commit for the done part.

This revision now requires review to proceed.Mar 11 2021, 5:24 PM
This revision is now accepted and ready to land.Mar 11 2021, 6:11 PM

I re-tested after imp@'s update of Thu, Mar 11, 09:24; no issues:

freebeast(14.0-C)[1] uname -aUK
FreeBSD freebeast.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #1211 main-n245387-913e7dc3e0eb-dirty: Thu Mar 11 10:38:49 PST 2021     root@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC  amd64 1400005 1400005

There is no mtx_destroy() so I guess you cannot unload this module.

sys/cam/cam_xpt.c
3169

I suspect that if we move that, we can get rid of the run interlock too... At best it saves extra wakeups in the rare case where races are lost if we move this... I'll do a separate commit for the done part.

3169

As discussed in the separate review, the wakeup can be outside of the lock safely in this case. The only consequence of losing the "race" here is that the other thread might grab the lock as soon as this one releases it and handle the work request and go to sleep before wakeup() runs (assume a preemption just before wakeup() as the worst case to make the race longer). All that happens then is that you wakeup the thread when there is nothing to do so it wakes up and goes right back to sleep again. However, by moving the wakeup() out from under the lock, you reduce lock contention since if the wakeup() resumes the thread on another CPU, it will then immediately contest on this lock and possibly block (again, assume a worst case of a preemption just _after_ wakeup() to exacerbate this race). For this reason, it's actually a fairly common pattern to do wakeup's after dropping the lock when it is feasible.

You could perhaps do the 'run' optimization from xpt_done() here as well where you only do the wakeup if the queue was empty before you inserted the new ccb into it. Note that this can in theory delay the wakeup() (e.g. if you get preempted before the wakeup in the original thread and meanwhile another thread enqueue's a second ccb and wouldn't do the wakeup, so both requests sit on the queue until the first thread resumes). The tradeoff there is potential latency vs the overhead of calling wakeup() which still has to go lock a sleepq chain spin lock to look for the right sleep queue to see if there is a thread to wake up or not. It's not clear to me what the right tradeoff there is, and it might be fine to just do the wakeup() unconditionally instead.

There is no mtx_destroy() so I guess you cannot unload this module.

MOD_UNLOAD always fails for cam.ko and it doesn't have any unload handling at all.

Move the wakeup outside the lock.

While classically there's a race if thread S is about to sleep and thread W
wants to wake it in general, in this case all 'lost races' are harmless.
We won't sleep with work in the work queue.

This revision now requires review to proceed.Mar 12 2021, 5:32 PM
sys/cam/cam_xpt.c
3169

Since async events are relatively rare, I suspect the other optimizations for more contended paths won't matter (even though it will add a tiny amount to the sleepq lock contention). I may try to get xpt_done_td to just handle both and allow sleeping for the async queue (they will be separate threads, so the sleeping won't delay other transactions). That would allow us to keep the same optimizations for both cases.

But there's already a couple of bugzilla bugs on this so I'd like to get it in as it is.

imp marked 3 inline comments as done.Mar 12 2021, 5:38 PM

BTW, this is the final code I intend to commit later today.

Tested again after Warner's last change; no issues:

freebeast(14.0-C)[1] uname -aUK
FreeBSD freebeast.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #1213 main-n245402-6385cabd5be6-dirty: Fri Mar 12 10:01:51 PST 2021     root@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC  amd64 1400005 1400005
This revision was not accepted when it landed; it landed in state Needs Review.Mar 12 2021, 8:50 PM
This revision was automatically updated to reflect the committed changes.