Page MenuHomeFreeBSD

Create root DMA tag and fix MBUS windows on DMA coherent platforms
ClosedPublic

Authored by mw_semihalf.com on Jun 14 2017, 10:50 PM.
Tags
Referenced Files
Unknown Object (File)
Tue, Apr 16, 9:07 PM
Unknown Object (File)
Sat, Apr 13, 5:16 PM
Unknown Object (File)
Mon, Apr 8, 3:21 AM
Unknown Object (File)
Sun, Apr 7, 11:43 AM
Unknown Object (File)
Feb 17 2024, 12:06 AM
Unknown Object (File)
Jan 19 2024, 10:20 PM
Unknown Object (File)
Jan 18 2024, 9:17 AM
Unknown Object (File)
Dec 29 2023, 6:07 AM

Details

Summary

Armada 38x SoCs, in order to work properly in IO-coherent mode,
requires an update of the MBUS windows attributesd.

This patch also configures nexus coherent dma tag, because all
busses and children devices have to inherit this setting in runtime.
The latter has to be executed as a sysinit (SI_SUB_DRIVERS type),
so that bus_dma_tag_create() can be executed properly.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

When a PL310 cache is used in a system that provides hardware
coherency, the entire outer cache operations are useless, and can be
skipped.

This is significantly incorrect/misleading/false sentence. All these outer cache operations are necessary for standard mapping operations, irrespective of coherency status.
Please, reword this part in commit.

The drain_writebuf() is not directly related to caches nor SOC coherency (at least I hope that Marvell coherency mechanism drains writebuf for DMA transactions). It control transaction visibility (for hardware/device observers).
That's why is used in bus_barrier(). But this patch and previously mentioned necessity of DEVICE -> SO remapping gives me big red exclamation light.
In short and in this context - this patch effectively nullifies bus_barrier(), causing that device and memory writes can be interleaved. Then DEVICE -> SO remapping causes that each write to device is followed by implicit
bus_barrier() (because SO).

I just guessing and I can be wrong, but this looks for me like papering over real problem - do you have these bugs confirmed by Marvell?

Hi!

Yes, it's all documented in Marvell errata for the support. Both their private and the mainline kernels had to be modified in a similar way. Just please take a look:
https://patchwork.kernel.org/patch/6993601/

I do not think this patch was accepted in mainline Linux -> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-August/363953.html

But this one is -> https://github.com/torvalds/linux/commit/98ea2dba65932ffc456b6d7b11b8a0624e2f7b95
Please note that commit log uses "the outer cache sync operation is useless: and not a " the entire outer cache operations are useless".
Also note that this behavior is driven by "arm,io-coherent" property.

Do you have also confirmation / ptr to patch for DEVICE to SO remapping?
I'm asking because this can be related to skipping drain_writebuffer() (outer sync ) in bus_space_barrier(). And if that's true, then we need
another implementation of bus barrier...

In D11203#231935, @meloun-miracle-cz wrote:

I do not think this patch was accepted in mainline Linux -> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-August/363953.html

But this one is -> https://github.com/torvalds/linux/commit/98ea2dba65932ffc456b6d7b11b8a0624e2f7b95
Please note that commit log uses "the outer cache sync operation is useless: and not a " the entire outer cache operations are useless".
Also note that this behavior is driven by "arm,io-coherent" property.

Event better. We were thinking how to replace the callbacks in the pl310 in a nice way. Are you ok with using "arm,io-coherent" property in pl310 in a similar way?

Do you have also confirmation / ptr to patch for DEVICE to SO remapping?

Yes, I saw numerous internal Marvell documents and email threads. Please check following commits:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm/mach-mvebu/coherency.c?h=v4.12-rc5&id=6a02734d420fca778554878d03017017537d92e1
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm/mach-mvebu/coherency.c?h=v4.12-rc5&id=c5379ba8fccd99d5f99632c789f0393d84a57805

Loading crypto engine was our test to check if the deadlocks occur.

I'm asking because this can be related to skipping drain_writebuffer() (outer sync ) in bus_space_barrier(). And if that's true, then we need
another implementation of bus barrier...

Well, there's been no ARMv7 platform using HW IO coherency in FreeBSD, so we can't tell for sure. Our changes are totally transparent to other platforms and generic code. Bear in mind, that this drain is not problematic, when MBUS transactions are set to non-coherent (current HEAD).

Actually, the drain_writebuffer does not seem to be problematic (at least it couldn't trigger a deadlock in our tests - crypto engines under huge load), but the one in arm_irq_memory_barrier (and of course one in busdma, but it was removed by previous patch). Touching L2, when handling irq's, to me seems actually as sort of workaround for some problems platform specific problems- 75 line justification comment convinces me it could be an actual hack ;)

In D11203#231935, @meloun-miracle-cz wrote:

I do not think this patch was accepted in mainline Linux -> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-August/363953.html

But this one is -> https://github.com/torvalds/linux/commit/98ea2dba65932ffc456b6d7b11b8a0624e2f7b95
Please note that commit log uses "the outer cache sync operation is useless: and not a " the entire outer cache operations are useless".
Also note that this behavior is driven by "arm,io-coherent" property.

Event better. We were thinking how to replace the callbacks in the pl310 in a nice way. Are you ok with using "arm,io-coherent" property in pl310 in a similar way?

I'm fine with both solutions, with preference to "arm,io-coherent". Using standardized, documented way is always better :)

Do you have also confirmation / ptr to patch for DEVICE to SO remapping?

Yes, I saw numerous internal Marvell documents and email threads. Please check following commits:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm/mach-mvebu/coherency.c?h=v4.12-rc5&id=6a02734d420fca778554878d03017017537d92e1
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm/mach-mvebu/coherency.c?h=v4.12-rc5&id=c5379ba8fccd99d5f99632c789f0393d84a57805

Loading crypto engine was our test to check if the deadlocks occur.

Oki, thanks.

I'm asking because this can be related to skipping drain_writebuffer() (outer sync ) in bus_space_barrier(). And if that's true, then we need
another implementation of bus barrier...

Well, there's been no ARMv7 platform using HW IO coherency in FreeBSD, so we can't tell for sure. Our changes are totally transparent to other platforms and generic code.

Don't take me bad, I have no problem with this part of the patch - I only want to understand where problem is. And I think that having working bus_space_barrier() is also important for many drivers.

Bear in mind, that this drain is not problematic, when MBUS transactions are set to non-coherent (current HEAD).

Actually, the drain_writebuffer does not seem to be problematic (at least it couldn't trigger a deadlock in our tests - crypto engines under huge load), but the one in arm_irq_memory_barrier (and of course one in busdma, but it was removed by previous patch).

hmm, I afraid that this is related to probability only :(

Touching L2, when handling irq's, to me seems actually as sort of workaround for some problems platform specific problems- 75 line justification comment convinces me it could be an actual hack ;)

Yes and no :) It's an attempt to solve the described problem in generic way, but only works only for systems with external L2 cache.
The only real solution, readback of device interrupt mask/status register is actually implemented in some(most?) drivers so it's probably a time to remove this hack.

mw_semihalf.com edited the summary of this revision. (Show Details)

Do not remove cpufuncs.cf_l2cache_drain_writebuf, due to introducing "arm,io-coherent" property for PL310 in https://reviews.freebsd.org/D11245

mw_semihalf.com retitled this revision from Fix support of MBUS windows and L2 cache on Marvell DMA coherent platforms to Create root DMA tag and fix MBUS windows on DMA coherent platforms.
mw_semihalf.com edited the summary of this revision. (Show Details)
  • Modify nexus dma tag, following https://reviews.freebsd.org/D11202
  • Improve commit log
  • Change ddr_attr for A38x SoCs, depending on the compatible string in the ofwbus ("/")
This revision is now accepted and ready to land.Jun 21 2017, 1:33 PM
This revision was automatically updated to reflect the committed changes.