Page MenuHomeFreeBSD

arm64: Move "segments" array from bus_dma_tag to bus_dmamap.
Needs ReviewPublic

Authored by mindal_semihalf.com on Jun 26 2019, 1:22 PM.

Details

Reviewers
andrew
mw
manu
Summary

When a dma load operation is performed the description of loaded segments is passed to callback function in an array.
Since said array is stored as a part of bus_dma_tag structure it is possible to overwrite it during concurrent loads.
Fix that by moving it to the bus_dmamap.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

Isn't this also a problem on amd64?

royger added a subscriber: royger.Jun 26 2019, 2:43 PM

I'm not sure I understand the motivation behind that change, is this fixing an existing issue with a specific driver?

Can you be more specific about the flow that can lead to overwrite the segs array?

Also if such a change is required I think it should be likely done to all arches, not only arm64.

sys/arm64/arm64/busdma_bounce.c
319

This alloc should be done in alloc_dmamap if you move the segments array from dmatag to dmamap

Isn't this also a problem on amd64?

I'm not sure, but after having a quick look at busdma_dmar.c I think that it could be.

I'm not sure I understand the motivation behind that change, is this fixing an existing issue with a specific driver?

I found it while writing driver for a crypto accelerator, but you should be able to observe it with existing drivers.

Can you be more specific about the flow that can lead to overwrite the segs array?

Basically if you have a lot of threads doing DMA loads on the same tag concurrently there is a small chance of the segs array being overwritten before it is read in the callback.
It is a rare race, but reproducible.

Also if such a change is required I think it should be likely done to all arches, not only arm64.

Good question. I just found that r289862 contains a similar fix for arm.

royger added a comment.EditedJun 26 2019, 3:37 PM

Can you be more specific about the flow that can lead to overwrite the segs array?

Basically if you have a lot of threads doing DMA loads on the same tag concurrently there is a small chance of the segs array being overwritten before it is read in the callback.
It is a rare race, but reproducible.

Hm, I'm not sure I see how that's supposed to happen. busdma_swi already takes a lock (dmat->common.lockfunc) to prevent issuing multiple loads on the same tag, thus there should be no overwriting while the caller callback is using tag->segs?

Can you be more specific about the flow that can lead to overwrite the segs array?

Basically if you have a lot of threads doing DMA loads on the same tag concurrently there is a small chance of the segs array being overwritten before it is read in the callback.
It is a rare race, but reproducible.

Hm, I'm not sure I see how that's supposed to happen. busdma_swi already takes a lock (dmat->common.lockfunc) to prevent issuing multiple loads on the same tag, thus there should be no overwriting while the caller callback is using tag->segs?

Well, I'm not sure how dusdma_swi can help us in this case.
If you go through the flow starting at bus_dmamap_load, then going into bounce_bus_dmamap_load_buffer no lock is acquired there.

Can you be more specific about the flow that can lead to overwrite the segs array?

Basically if you have a lot of threads doing DMA loads on the same tag concurrently there is a small chance of the segs array being overwritten before it is read in the callback.
It is a rare race, but reproducible.

Hm, I'm not sure I see how that's supposed to happen. busdma_swi already takes a lock (dmat->common.lockfunc) to prevent issuing multiple loads on the same tag, thus there should be no overwriting while the caller callback is using tag->segs?

Well, I'm not sure how dusdma_swi can help us in this case.
If you go through the flow starting at bus_dmamap_load, then going into bounce_bus_dmamap_load_buffer no lock is acquired there.

Right, the lock is only used for the deferred callback.

I think this is fine, but should be done for x86 also at least (that's where arm64 busdma code is coming from). This expected behaviour should also be mentioned on the busdma man page.

I think the general expectation of busdma code is that you serialize loads when using the same tag, but I cannot find an explicit reference to it on the man page, and IMO the locking section is too vague and doesn't explicitly say that you are not supposed to issue concurrent load calls using the same tag.

Can you be more specific about the flow that can lead to overwrite the segs array?

Basically if you have a lot of threads doing DMA loads on the same tag concurrently there is a small chance of the segs array being overwritten before it is read in the callback.
It is a rare race, but reproducible.

Hm, I'm not sure I see how that's supposed to happen. busdma_swi already takes a lock (dmat->common.lockfunc) to prevent issuing multiple loads on the same tag, thus there should be no overwriting while the caller callback is using tag->segs?

Well, I'm not sure how dusdma_swi can help us in this case.
If you go through the flow starting at bus_dmamap_load, then going into bounce_bus_dmamap_load_buffer no lock is acquired there.

Right, the lock is only used for the deferred callback.
I think this is fine, but should be done for x86 also at least (that's where arm64 busdma code is coming from). This expected behaviour should also be mentioned on the busdma man page.
I think the general expectation of busdma code is that you serialize loads when using the same tag, but I cannot find an explicit reference to it on the man page, and IMO the locking section is too vague and doesn't explicitly say that you are not supposed to issue concurrent load calls using the same tag.

Ok, so as you said we have two ways to proceed from here:

  1. Modify the manual to explicitly require the user to serialize all loads on the same tag.

Do we know if all existing drivers follow that rule?

  1. Move the segmets array on all architectures.

I could prepare the patch, but frankly I don't feel competent enough to make such design decision.

Can you be more specific about the flow that can lead to overwrite the segs array?

Basically if you have a lot of threads doing DMA loads on the same tag concurrently there is a small chance of the segs array being overwritten before it is read in the callback.
It is a rare race, but reproducible.

Hm, I'm not sure I see how that's supposed to happen. busdma_swi already takes a lock (dmat->common.lockfunc) to prevent issuing multiple loads on the same tag, thus there should be no overwriting while the caller callback is using tag->segs?

Well, I'm not sure how dusdma_swi can help us in this case.
If you go through the flow starting at bus_dmamap_load, then going into bounce_bus_dmamap_load_buffer no lock is acquired there.

Right, the lock is only used for the deferred callback.
I think this is fine, but should be done for x86 also at least (that's where arm64 busdma code is coming from). This expected behaviour should also be mentioned on the busdma man page.
I think the general expectation of busdma code is that you serialize loads when using the same tag, but I cannot find an explicit reference to it on the man page, and IMO the locking section is too vague and doesn't explicitly say that you are not supposed to issue concurrent load calls using the same tag.

Ok, so as you said we have two ways to proceed from here:

  1. Modify the manual to explicitly require the user to serialize all loads on the same tag.

Do we know if all existing drivers follow that rule?

  1. Move the segmets array on all architectures.

I could prepare the patch, but frankly I don't feel competent enough to make such design decision.

Right, I completely understand. I don't think however that modifying busdma behaviour only for arm64 is the right solution. Drivers built and tested on arm64 might then fail to work for other arches, and no one would know why.

I think that ATM modifying the man page to explicitly state that concurrent loads using the same tag are not supported is the right way forward. Note that I'm not the maintainer of busdma, so maybe others have different opinions.

Can you be more specific about the flow that can lead to overwrite the segs array?

Basically if you have a lot of threads doing DMA loads on the same tag concurrently there is a small chance of the segs array being overwritten before it is read in the callback.
It is a rare race, but reproducible.

Hm, I'm not sure I see how that's supposed to happen. busdma_swi already takes a lock (dmat->common.lockfunc) to prevent issuing multiple loads on the same tag, thus there should be no overwriting while the caller callback is using tag->segs?

Well, I'm not sure how dusdma_swi can help us in this case.
If you go through the flow starting at bus_dmamap_load, then going into bounce_bus_dmamap_load_buffer no lock is acquired there.

Right, the lock is only used for the deferred callback.
I think this is fine, but should be done for x86 also at least (that's where arm64 busdma code is coming from). This expected behaviour should also be mentioned on the busdma man page.
I think the general expectation of busdma code is that you serialize loads when using the same tag, but I cannot find an explicit reference to it on the man page, and IMO the locking section is too vague and doesn't explicitly say that you are not supposed to issue concurrent load calls using the same tag.

Ok, so as you said we have two ways to proceed from here:

  1. Modify the manual to explicitly require the user to serialize all loads on the same tag.

Do we know if all existing drivers follow that rule?

  1. Move the segmets array on all architectures.

I could prepare the patch, but frankly I don't feel competent enough to make such design decision.

Right, I completely understand. I don't think however that modifying busdma behaviour only for arm64 is the right solution. Drivers built and tested on arm64 might then fail to work for other arches, and no one would know why.

Interestingly enough on arm and powerpc that array is stored as a part of dma map structure, so we might run into that anyway.
Anyway I agree with you that we need to have the same behavior across architectures.

Interestingly enough on arm and powerpc that array is stored as a part of dma map structure, so we might run into that anyway.
Anyway I agree with you that we need to have the same behavior across architectures.

Yes, I know, and I think that's wrong. TBH I didn't realize, or else I would have complained. Let's try to avoid expanding those behavioural differences across more arches.