Page MenuHomeFreeBSD

mpi3mr: Fix major DMA handling bugs
AbandonedPublic

Authored by mav on Nov 14 2023, 8:57 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 23, 6:40 PM
Unknown Object (File)
Sat, Jan 18, 10:20 PM
Unknown Object (File)
Sat, Jan 18, 9:52 PM
Unknown Object (File)
Dec 2 2024, 8:21 PM
Unknown Object (File)
Dec 2 2024, 4:56 PM
Unknown Object (File)
Nov 16 2024, 5:52 AM
Unknown Object (File)
Nov 16 2024, 2:15 AM
Unknown Object (File)
Nov 16 2024, 1:21 AM
Subscribers
None

Details

Summary
  • Remove 32-bit limitations from mpi3mr_parent_dmat and buffer_dmat. Obviously HBA these days must have 64bit data addresses support. There are still 32-bit limitations on minor some minor allocations, but those may be less obvious, so I'll leave it to somebody who know the hardware.
  • Pass BUS_DMA_NOWAIT to bus_dmamap_load_ccb(). It is not a great design, but the way code is written it has to be there. Remove incorrect EINPROGRESS handling there, unneeded with the BUS_DMA_NOWAIT.
  • Remove MAXPHYS usage. It is obsolete few years ago and should not be used. Just remove the limits, I don't think they are needed there.
Test Plan

Created ZFS pool of 60 disks. Run heavy write workload. Without the patch I/O freezes in seconds with SIM queue permanently frozen. With the patch it works fine.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

mav requested review of this revision.Nov 14 2023, 8:57 PM
mav created this revision.

Add BUS_DMA_NOWAIT to bus_dmamap_load() with mpi3mr_memaddr_cb() callback. They by definition must be synchronous.

I also suspect that most of bus_dmamem_alloc() calls could use BUS_DMA_WAITOK, but I haven't looked on the driver locking yet.

I've already fixed all of these in a different series (but didn't do the template work) that I've torture tested on netflix workload. Without the NOWAIT hack.

https://reviews.freebsd.org/D42588
https://reviews.freebsd.org/D42559
https://reviews.freebsd.org/D42543
https://reviews.freebsd.org/D42542
https://reviews.freebsd.org/D42541
https://reviews.freebsd.org/D42540
https://reviews.freebsd.org/D42539
https://reviews.freebsd.org/D42538
https://reviews.freebsd.org/D42537

I didn't know you were using this yet, or I'd have included you on the reviews. It also handles the real limitations of the hardware: for me it's 63 bits, but there's others with fewer bits decoded (this is in the flags word of the IOCFacts).

Created ZFS pool of 60 disks. Run heavy write workload. Without the patch I/O freezes in seconds with SIM queue permanently frozen. With the patch it works fine.

Yea, we had the same issue at Netflix,. which is why I did the patch series I did.

I like your template stuff, though, so maybe I could add that as another patch at the end of my series?

sys/dev/mpi3mr/mpi3mr.c
4915

Ah, I also didn't catch these two...

https://reviews.freebsd.org/D42605

and I have the author listed as you.

sys/dev/mpi3mr/mpi3mr_app.c
361

Hmmm, I didn't do these in my changes (for mpi3mr_memaddr_cb). They work w/o it, but only because we have plenty of memory. BUS_DMA_NOWAIT ensures there's nothing deferred, but we still have the ENOMEM case to worry about... though only if we bounce, and only if one of the bouncers returned that (I don't see any that do that: they just return EINPROGRESS).

mps/mpr both have this same pattern as well: They both call bus_dmamap_load for cases like this with 0 flags, and no error checking to see if the load succeeded or failed, though we know their bus_dmamap_alloc succeeded, so that's likely not a big deal. Without bouncing, there's nothing to defer, so no error possible. Still, BUS_DMA_NOWAIT wouldn't hurt here (and also wouldn't hurt back ported to mpr/mps).

sys/dev/mpi3mr/mpi3mr_app.c
361

https://reviews.freebsd.org/D42606 has mpi3mr fixed, but not mpr/mps.

Abandoning this review in favor of @imp ones.