Page MenuHomeFreeBSD

WIP/RFC: another busdma fix for small bounce transfers
ClosedPublic

Authored by mhorne on Nov 27 2024, 5:23 PM.
Tags
None
Referenced Files
F132462111: D47807.diff
Fri, Oct 17, 3:28 AM
Unknown Object (File)
Thu, Oct 16, 9:58 AM
Unknown Object (File)
Thu, Oct 9, 9:39 PM
Unknown Object (File)
Wed, Oct 8, 9:05 PM
Unknown Object (File)
Thu, Oct 2, 5:47 AM
Unknown Object (File)
Thu, Oct 2, 2:43 AM
Unknown Object (File)
Wed, Oct 1, 10:57 PM
Unknown Object (File)
Wed, Oct 1, 12:14 AM

Details

Summary

More fallout from a77e1f0f81df.

When the tag has an alignment requirement, we should clamp sgsize to be
no larger than the remaining buflen. Otherwise, we will transfer more
than is required, and this can manifest as data corruption.

The issue is observed on powerpc as noted in the pull request:
https://github.com/freebsd/freebsd-src/pull/1415

I also observe the issue locally on riscv hardware, with an 8-byte
transfer having 64-byte alignment.

Try as I might, I cannot articulate the reason for rounding sgsize up to
the tag's alignment, at least in the current code. In fact, it is not
done on arm or arm64 (the latter removed by mmel@ in f635cef2a420).

I speculate that the original purpose of this is now achieved by the line:

sgsize = MIN(buflen, PAGE_SIZE - (curaddr & PAGE_MASK));

I find this code very difficult to reason about, nor is it easy to
experiment with. The fix I have here should be the safest fix, but it
may not be the most correct. This code is fragile and I would benefit
from other knowledgable eyes on it.

Thoughts? What is the correct action to take here?

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I'm sorry for the delayed follow-up. I'm trying to understand the purpose of the roundup. It appears to have been introduced by commit daf6545e6158f; the implication seems to be that some drivers depend on the property that "the relative alignment of two consecutive bytes in the I/O stream have a difference of 1 even if they are not physically contiguous."

I don't really understand why that property is important, but assuming so, I think your patch is probably the right thing to do.

I'm sorry for the delayed follow-up. I'm trying to understand the purpose of the roundup. It appears to have been introduced by commit daf6545e6158f; the implication seems to be that some drivers depend on the property that "the relative alignment of two consecutive bytes in the I/O stream have a difference of 1 even if they are not physically contiguous."

I don't really understand why that property is important, but assuming so, I think your patch is probably the right thing to do.

It's already been a long while, but I am going to punt on this larger question for another day still.

The patch at hand demonstrably fixes two erroneous cases; mine and one described in the GitHub PR.

Adding the seatbelt to ensure we do not exceed buflen can at worst be redundant, but never incorrect. With that thinking in mind I plan to just commit what is here this week.

I'm trying to think of a case when the roundup would ever be correct. That is, there are two cases after the previous expression:

		sgsize = MIN(buflen, PAGE_SIZE - (curaddr & PAGE_MASK));

Either 1) sgsize == buflen, or 2) sgsize == PAGE_SIZE - (curaddr & PAGE_MASK)

Your patch turns case 1) into a no-op as it effectively undoes the roundup. Case 2) seems worse though. This means that we can now expand the S/G entry *off the end of the page* and randomly read/write data on the next physical page of RAM. There is no way that is correct. I wonder if this was meant to be a rounddown to keep segments a multiple of the alignment instead of a roundup? However, in that case you'd never finish a buffer whose size wasn't a multiple of the alignment.

Aha! See this part of the commit log from daf6545e6158f:

Now all contiguous regions returned from bus-dma will be aligned to the
alignment constraint and all but the last region are guaranteed to be
a multiple of the alignment in length.

I think if that is the desired behavior, then the correct fix instead is to replace the roundup2 line with this:

if (sgsize != buflen)
    sgsize = rounddown2(sgsize, dmat->common.alignment);

That will give the desired property. It should never have been a roundup2().

Hmm, well, actually, I think I'm not quite correct as we would only run off the page in case 2) if BUS_DMA_KEEP_PG_OFFSET is true (I think) and that wasn't an option when the original commit was made. I think this change as-is is at least an improvement. I am still worried about the case of a non-zero page offset that results in a remaining length (sgsize) that is not a multiple of the alignment, but BUS_DMA_KEEP_PG_OFFSET is set so we are not just copying to offset 0 of the page. Maybe that case can't happen though? The sole user of that flag seems to use an alignment of 1 if there is more than one segment.

In D47807#1182659, @jhb wrote:

I'm trying to think of a case when the roundup would ever be correct. That is, there are two cases after the previous expression:

		sgsize = MIN(buflen, PAGE_SIZE - (curaddr & PAGE_MASK));

Either 1) sgsize == buflen, or 2) sgsize == PAGE_SIZE - (curaddr & PAGE_MASK)

Your patch turns case 1) into a no-op as it effectively undoes the roundup. Case 2) seems worse though. This means that we can now expand the S/G entry *off the end of the page* and randomly read/write data on the next physical page of RAM. There is no way that is correct. I wonder if this was meant to be a rounddown to keep segments a multiple of the alignment instead of a roundup? However, in that case you'd never finish a buffer whose size wasn't a multiple of the alignment.

Aha! See this part of the commit log from daf6545e6158f:

Now all contiguous regions returned from bus-dma will be aligned to the
alignment constraint and all but the last region are guaranteed to be
a multiple of the alignment in length.

I think if that is the desired behavior, then the correct fix instead is to replace the roundup2 line with this:

if (sgsize != buflen)
    sgsize = rounddown2(sgsize, dmat->common.alignment);

That will give the desired property. It should never have been a roundup2().

In D47807#1182660, @jhb wrote:

Hmm, well, actually, I think I'm not quite correct as we would only run off the page in case 2) if BUS_DMA_KEEP_PG_OFFSET is true (I think) and that wasn't an option when the original commit was made. I think this change as-is is at least an improvement. I am still worried about the case of a non-zero page offset that results in a remaining length (sgsize) that is not a multiple of the alignment, but BUS_DMA_KEEP_PG_OFFSET is set so we are not just copying to offset 0 of the page. Maybe that case can't happen though? The sole user of that flag seems to use an alignment of 1 if there is more than one segment.

Thank you for the analysis. In my previous study of the code I approached some similar understandings, but found it hard to say concretely what is/is not possible in these scenarios where BUS_DMA_KEEP_PG_OFFSET is involved. The idea that it should have been a rounddown2() from the start is particularly provocative.

For now I will take the most conservative action and commit what is here as an immediate fix, with the intent to revisit this question more completely at another time.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 9 2025, 6:43 PM
This revision was automatically updated to reflect the committed changes.

To be clear, the roundup2() is based on the assumption that 1) the copy always starts to/from offset 0 of a bounce page, so we always have a full PAGE_SIZE bytes of "room" to use in a bounce page and 2) a desire for the lengths of individual S/G entries to be a multiple of the alignment (the quote from the commit log). Given those, it's actually ok to roundup and copy up to align - 1 bytes from the next "source" page into the bounce buffer.

(Sorry, I had written this before, but failed to click Submit it seems)