Paths

Table of Contentst

Enable L1 Dcache prefetch for Cortex A9 CPUs
AbandonedPublic
Actions

Authored by zbb on Mar 2 2017, 2:30 PM.

Details

Reviewers

andrew
mmel
skra

Summary

Some of the processors do not have this feature
enabled by default. Turn it on as L1 prefetch will
result in significant performance improvement.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 9207
Build 9654: CI src build	Jenkins

Event Timeline

zbb updated this revision to Diff 25886.Mar 2 2017, 2:30 PM

zbb retitled this revision from to Enable L1 Dcache prefetch for Cortex A9 CPUs.

zbb updated this object.

zbb edited the test plan for this revision. (Show Details)

zbb added reviewers: skra, mmel, andrew.

zbb set the repository for this revision to rS FreeBSD src repository - subversion.

zbb added a subscriber: ARM.

Herald added a subscriber: imp. · View Herald TranscriptMar 2 2017, 2:30 PM

zbb added a subscriber: mw_semihalf.com.Mar 2 2017, 2:40 PM

See Erratum 571620, 719331, 719332, 751473 and probably more others.

Right place for proper CPU setup (including recommended errata fixes) is bootloader, not a boot stage of OS.

Please take in account:

on many boards, OS starts in in non-secure mode with disabled access to ACTRL (Pandaboard is one example).
effect of NS write to protected ACTRL is not consistent across for Cortex family, some CPU's ignore writes, some generates exception (A8, and probably some older A9).
the current code writes to ACTRL only if bits absolutely necessary for FreeBSD run are not set. In this case, it's irrelevant if we crashes here, or few instruction later.

This revision now requires changes to proceed.Mar 2 2017, 5:15 PM

In D9864#203677, @meloun-miracle-cz wrote:

See Erratum 571620, 719331, 719332, 751473 and probably more others.

Right place for proper CPU setup (including recommended errata fixes) is bootloader, not a boot stage of OS.

Please take in account:

on many boards, OS starts in in non-secure mode with disabled access to ACTRL (Pandaboard is one example).

effect of NS write to protected ACTRL is not consistent across for Cortex family, some CPU's ignore writes, some generates exception (A8, and probably some older A9).

the current code writes to ACTRL only if bits absolutely necessary for FreeBSD run are not set. In this case, it's irrelevant if we crashes here, or few instruction later.

Thanks for the detailed explanation. Closing this revision then.

@meloun-miracle-cz just to consider all possibillities. Do you have any idea whatsoever how this change could be inserted into the OS (in case there is no possibility to modify firmware), even in form of an option or platform-specific stuff?

There is the PLATFORM code to handle differences between different ARM based SoCs.

Yep, fully agree. From my point of view, platform_late_init() (for BP) is best place for performance tweaks. At this point, printf() works and cpuinfo is initialized, so you can check proper CPU revision, print some warning about potentially dangerous action, etc...
For AP, you can use standard platform_mp_start_ap(), but many ACTRL bits are shared between CPU's. So, typically, it's sufficient to set up only BP.

Unfortunately, platform_late_init() is called to late for majority of errata fixups - in this case, i think that only real safe solution is bootloader fix.

In D9864#204162, @meloun-miracle-cz wrote:

Yep, fully agree. From my point of view, platform_late_init() (for BP) is best place for performance tweaks. At this point, printf() works and cpuinfo is initialized, so you can check proper CPU revision, print some warning about potentially dangerous action, etc...
For AP, you can use standard platform_mp_start_ap(), but many ACTRL bits are shared between CPU's. So, typically, it's sufficient to set up only BP.

Unfortunately, platform_late_init() is called to late for majority of errata fixups - in this case, i think that only real safe solution is bootloader fix.

Thanks for your answers and suggestions.
I can see few problems here, please correct me if I'm wrong.

We could modify ACTRL bits in the platform_late_init() but this applies to the primary CPU only. Other CPUs will not be affected.
Secondary CPUs do not have their bits modified anywhere outside init_secondary(). platform_mp_start_ap() brings up secondary CPUs and is not executed by them.

Therefore the main place to modify ACTRL for all CPUs is still cpuinfo_get_actlr_modifier() + reinit_mmu().
Firmware cannot be modified ;-). That is why I'm looking for other options to have this change in the mainline FreeBSD.

zbb reclaimed this revision.Mar 6 2017, 10:55 AM

This revision now requires changes to proceed.Mar 6 2017, 10:55 AM

BTW. How do you expect u-boot to turn this bit on if it always works in UP?

My suggestion at this point would be to turn on L1 prefetch after checking CPU revision to ensure that we don't drive into errata.
What is your opinion?

I'm sorry about my platform_mp_start_ap() mistake.
I understand that cpuinfo_get_actlr_modifier() looks like most direct (and most easier) way how to make this. But again, this setting must be platform dependent.
Assume that you have 2 different boards (with different bootloaders) but with exactly same CPU. One board start OS in secure world, one in non-secure. . Then, write to ACTRL is OK for first one, but may cause exception on second.

IMO, we can add another platform method, say platform_late_init_ap(), called from here https://svnweb.freebsd.org/base/head/sys/arm/arm/mp_machdep.c?view=markup#l182.

Is this sufficient for you ?

In D9864#204946, @meloun-miracle-cz wrote:

I'm sorry about my platform_mp_start_ap() mistake.
I understand that cpuinfo_get_actlr_modifier() looks like most direct (and most easier) way how to make this. But again, this setting must be platform dependent.
Assume that you have 2 different boards (with different bootloaders) but with exactly same CPU. One board start OS in secure world, one in non-secure. . Then, write to ACTRL is OK for first one, but may cause exception on second.

IMO, we can add another platform method, say platform_late_init_ap(), called from here https://svnweb.freebsd.org/base/head/sys/arm/arm/mp_machdep.c?view=markup#l182.

Is this sufficient for you ?

Hello. Thanks for your help with that.
In general, I don't insist on adding this change to all Cortex-A9 CPUs and I acknowledge your arguments.
Another platform-dependent method is OK for us. We need something that will be executed on both primary and secondary CPUs. It would be best if this stays in a common place for both primary and secondary CPUs (such as cpu_setup() is common for CPU0 and CPUx).
What do you think?

In D9864#204948, @zbb wrote:
In general, I don't insist on adding this change to all Cortex-A9 CPUs and I acknowledge your arguments.
Another platform-dependent method is OK for us. We need something that will be executed on both primary and secondary CPUs. It would be best if this stays in a common place for both primary and secondary CPUs (such as cpu_setup() is common for CPU0 and CPUx).
What do you think?

Yep, I have no problem with this {if cpu_setup() is new platform method). I prefer to call this new function in the place where cpuinfo is initialized and printf is working.

In D9864#204949, @meloun-miracle-cz wrote:

In D9864#204948, @zbb wrote:
In general, I don't insist on adding this change to all Cortex-A9 CPUs and I acknowledge your arguments.
Another platform-dependent method is OK for us. We need something that will be executed on both primary and secondary CPUs. It would be best if this stays in a common place for both primary and secondary CPUs (such as cpu_setup() is common for CPU0 and CPUx).
What do you think?

Yep, I have no problem with this {if cpu_setup() is new platform method). I prefer to call this new function in the place where cpuinfo is initialized and printf is working.

I have another idea. Please let me know your opinion.

Instead of creating another platform method that will be used only by A38X, we will add SYSINIT entry to mv_machdep.c (ifdef ARMADA38X)
This SYSINIT will be called after SMP is initialized (in case of multi-core kernel) and will set L1 prefetch bit for all CPUs using smp_rendezvous() function.
We will comment add vast commentary to this sysinit, why we are doing this, etc.

What do you think? Is this a better idea or we should rather proceed with the new global platform method?

Ahh, I see now, Marvell subtree is not fully converted to platform model :) . But trust me, conversion its not that hard and makes your life much easier.
Also, there is global effort to make single generic kernel for all supported armv6 boards, and #ifdef SoC only technique is in direct direct contradiction with this goal.
See allwinner subtree, mainly aw_machdep.c. This platform supports numerous board with single kernel in 'right' way, without #ifdef hell.

But anyway, because SYSINIT() based proposal doesn't 'collide' with other boards/platforms, I have no objections. Now platform method seems more flexible, but that's just my personal opinion.

In D9864#205105, @meloun-miracle-cz wrote:

Ahh, I see now, Marvell subtree is not fully converted to platform model :) . But trust me, conversion its not that hard and makes your life much easier.
Also, there is global effort to make single generic kernel for all supported armv6 boards, and #ifdef SoC only technique is in direct direct contradiction with this goal.
See allwinner subtree, mainly aw_machdep.c. This platform supports numerous board with single kernel in 'right' way, without #ifdef hell.

But anyway, because SYSINIT() based proposal doesn't 'collide' with other boards/platforms, I have no objections. Now platform method seems more flexible, but that's just my personal opinion.

Armada stuff provides platform_late_init, etc. but the problem is that in order to set up this bit on all CPUs we need to call platform method on each of them. We would need to add another platform_<something>_init for all platforms and call it from both primary and secondary CPUs (whereas platform_late_init is called only on the primary CPU). That is why I suggested SYSINIT as it sounds equally hackish and yet less expansive.

In D9864#205185, @zbb wrote:

Armada stuff provides platform_late_init, etc.

platform_late_init() is only transitional hack, full platform model needs FDT_PLATFORM_DEF() macro.

In D9864#205185, @zbb wrote:

but the problem is that in order to set up this bit on all CPUs we need to call platform method on each of them. We would need to add another platform_<something>_init for all platforms and call it from both primary and secondary CPUs (whereas platform_late_init is called only on the primary CPU). That is why I suggested SYSINIT as it sounds equally hackish and yet less expansive.

That's why I pointed to aw_machdep.c. Look to: https://svnweb.freebsd.org/base/head/sys/arm/allwinner/aw_machdep.c?revision=311237&view=markup#l167
With this style, you can mix 'platform specific' and 'platform universal' functions, you can save platform/cpu type to variable and then use it at runtime....

And with combination of new platform_late_init_ap() method (defaulting to nop) (see: https://svnweb.freebsd.org/base/head/sys/arm/arm/platform_if.m?limit_changes=0&view=markup#l57 for example), everything whats you need is single function (cpu_init)( from above) declared in appropriate platform methods table (assuming that same function can be called directly from platform_late_init()).

Reworked to use platform call.

zbb added parent revisions: D10682: Introduce platform CPU init for ARM, D10683: Introduce platform CPU info container for ARM. Use it for ACTLR..May 11 2017, 4:01 PM

Harbormaster failed remote builds in B9207: Diff 28247!May 11 2017, 4:15 PM

andrew added inline comments.May 11 2017, 6:12 PM

sys/arm/mv/mv_machdep.c
263	You could have this use `cp15_actlr_get`, `cp15_actlr_get` in the `platform_late_init` handler (with my suggestion in D10682).

zbb added inline comments.May 12 2017, 10:29 AM

sys/arm/mv/mv_machdep.c
263	We could use `cp15_actlr_get` + `cp15_actlr_get' but we would need to explicitly ensure that` platform_cpu_init` or any future equivalent is called after `reinit_mmu` which modifies the default settings of ACTLR and exchange them to the default hard-coded ones. So long story-short. We need to call this before `reinit_mmu` so that we can change mapping types and we need to call this after `reinit_mmu` to be able to use actlr_get/set sequence. This makes no sense. I would rather assume that we call this before the generic settings are applied as the generic settings close the windows for modifications.

andrew added inline comments.May 12 2017, 10:40 AM

sys/arm/mv/mv_machdep.c
263	I don't see this as a problem as we already need to do this to ensure we don't change the existing fields, other than the fields we wish to change. It would also be nice if we could have macros for the magic bits.

zbb added inline comments.May 12 2017, 11:48 AM

sys/arm/mv/mv_machdep.c
263	I didn't create a macro as there are no macros for other bits in the existing ACTLR modifier, just the description/commentary.

I would prefer to move this code to platform_late_init().
The only drawback is that you must use cp15_actlr_set() + cp15_actlr_get() for boot CPU...

sys/arm/mv/mv_machdep.c
263	ACTRL differs per Cortex class, and many bits are not defined at all (but these are used in ARM errata). So I think that hardcoded value with comment is bit a more readable that one time used define.

In D9864#222185, @meloun-miracle-cz wrote:

I would prefer to move this code to platform_late_init().
The only drawback is that you must use cp15_actlr_set() + cp15_actlr_get() for boot CPU...

Let's get to consensus on that.
Do you prefer to drop this commit and use cp15_actlr_get + cp15_actlr_set?

Abandon. Will use environment variables.

Revision Contents
Changeset List

Path

Size

sys/

arm/

mv/

mv_machdep.c

7 lines

Diff 28247

View Options

Enable L1 Dcache prefetch for Cortex A9 CPUsAbandonedPublicActions