Page MenuHomeFreeBSD

Generic interface driver for dmaengine devices
Needs RevisionPublic

Authored by rajesh1.kumar_amd.com on Jun 17 2020, 5:23 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Mar 29, 2:45 PM
Unknown Object (File)
Fri, Mar 13, 7:36 AM
Unknown Object (File)
Fri, Mar 13, 7:27 AM
Unknown Object (File)
Wed, Mar 11, 12:27 AM
Unknown Object (File)
Feb 9 2026, 3:30 AM
Unknown Object (File)
Nov 6 2025, 5:07 AM
Unknown Object (File)
Oct 29 2025, 1:44 AM
Unknown Object (File)
Oct 16 2025, 6:43 AM
Subscribers

Details

Reviewers
mav
cem
Summary

This patch has a new driver which can act as a generic interface for dmaengine
like devices.

This driver can be used by test drivers, applications to request for dma
channels, release the dma channels, initiate dma operations on the acquired
channels without knowing the specifics of underlying vendor specific hardware.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 31765
Build 29331: arc lint + arc unit

Event Timeline

mav requested changes to this revision.Jun 17 2020, 1:57 PM

I would be happy if we had some good abstraction for standalone DMA engines. Right now we are using IOAT engine in our products, but I'd be happy if there would be decent option from AMD side also. While this code is a good start, it is nowhere near something we could already use.

sys/dev/dmaengine/dmaengine.c
96

Instead of this racy approach there should be static initializers for both the TAILQ and mutex.

100

You can not use M_WAITOK under the non-sleepable lock. Move it out.

101

There can not be error if you use M_WAITOK.

108

What's the point of this device? What driver will attach it or why is it needed?

166

I am still at loss why do you need to have separate devices and all this bus preattinesses.

190

Why do you use _SAFE variant here while the TAILQ is not modified?

201

Semantics of this tcnt is whatever but not the total number of channels described in header file.

271

If you look on ioat(4) KPI, it has mechanisms to aggregate multiple separate transfers to be queued atomically, all or none, and potentially with lower overhead. Page-by-page calls in case of memory fragmentation can be expensive and must be heavily optimized.

sys/dev/dmaengine/dmaengine.h
54

How the application supposed to know number of channels? Why channels can't be shared between applications? How channel should be chosen if there are several DMA controllers on different NUMA domains?

You obviously know about the ioat(4) driver/KPI, and you should see I've added there method to get NUMA domain for channel. It is far from perfect, but at least some way to let application to do grounded decision. Alternatively KPI could somehow allow application to not bother about it and still get reasonable results.

This revision now requires changes to proceed.Jun 17 2020, 1:57 PM

FYI, sys/dev/xdma sort of attempts to be a generic DMA engine interface, but the design does not seem suitable outside of specific hardware.

Thanks @mav and @cem for your comments.

Recently, we have enabled the driver for AMD DMA engine for a use case. In that process, I came across "ioat" driver for Intel DMA engines. I see the interfaces defined by ioat can be used for our purpose as well. So, thought of writing a generic DMA interface driver for abstraction. I did a bit of search for any prior work, but missed this "xdma". Thanks @cem for pointing this.

Initially, we just needed KPI's to get a channel reference, release a channel reference and submit a memcpy operation. That's why I just limited with those KPI's, with an intent to extend further as needed.

@mav, I will go through the "xdma" code as well and see whether I can use it for my driver. Can you please let me know your comments whether it can fit for "ioat" as well?

Hi @mav / @cem

I have gone through the "xdma" driver and the associated drivers and got some idea about the design and flow. Summarizing my learning here.

DMA Drivers using xdma :

  • sys/dev/altera/msgdma/msgdma.c
  • sys/dev/altera/softdma/softdma.c
  • sys/dev/xdma/controller/p1330.c
  • sys/dev/xilinx/axidma/axidma.c
  • sys/mips/ingenic/jz4780_pdma.c

Client/Test Drivers using xdma : (to use underlying DMA engines mentioned above)

  • sys/dev/altera/if_atse.c (network use case)
  • sys/dev/flash/cqspi.c (storage use case)
  • sys/dev/xilinx/if_xae.c (network use case)
  • sys/mips/ingenic/jz4780_aic.c (audio use case - looks this implementation is not complete)

Generic Flow : (sorry for the long text here, rather than a flow chart)

Client driver calls the following interfaces in Init path,

  • "xdma_get" (or) "xdma_ofw_get" to get reference of "xdma_controller". This represents a DMA controller (not just a specific channel)
  • "xdma_channel_alloc" to get reference of "xdma_channel". This represents the individual channel in the DMA controller. "xdma_controller" above has a list to hold these channel reference. Client driver can request necessary channels and it's the DMA driver's responsibility to limit/share the channel request based on it's channel count.
  • "xdma_setup_intr" to register interrupt handlers of the client drivers with xdma. When DMA operations completes, DMA drivers interrupt handler will call "xdma_callback", which will call the client's registered interrupt handler to notify the completion. This interrupt handler is channel specific.
  • "xdma_request" to preallocate channel specific descriptors (which will be used for hardware submissions). Looks optional. DMA drivers can handle the hardware submissions accordingly.
  • "xdma_prep_sg" to configure the scatter-gather list for channel descriptors. This is channel specific.
  • "xdma_control" to enable the channels. Looks optional. DMA drivers can do it their way.

Detach path has corresponding cleanup interfaces to free the allocations made in Init path.

Client driver calls the following interfaces in Data path,

  • "xdma_enqueue" (mbuf version - for network use case, bio version - for storage use case) - to enqueue the DMA operation request to the DMA engine.
  • "xdma_queue_submit" to start the hardware submissions of the above enqueued requests
  • "xdma_dequeue" to clear the request after the DMA driver notifies the completion.

I have just given the brief idea about the design and flow here. I still have question to understand things deeper.

Having said this, we can use "xdma" for abstracting Intel and AMD DMA drivers. "xdma" will handle the DMA request, submission and completions accordingly. But, this needs considerable design/code changes and validation. Also currently "xdma" is used only by FDT devices and not PCI devices.

So, looking for your comments here to decide whether to abandon this patch and use "xdma" itself (or) to see if this patch can be of use in some means. Thanks.