Page MenuHomeFreeBSD

Enable L1 Dcache prefetch for Cortex A9 CPUs
AbandonedPublic

Authored by zbb on Mar 2 2017, 2:30 PM.

Details

Reviewers
andrew
mmel
skra
Summary

Some of the processors do not have this feature
enabled by default. Turn it on as L1 prefetch will
result in significant performance improvement.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 9207
Build 9654: CI src buildJenkins

Event Timeline

zbb updated this revision to Diff 25886.Mar 2 2017, 2:30 PM
zbb retitled this revision from to Enable L1 Dcache prefetch for Cortex A9 CPUs.
zbb updated this object.
zbb edited the test plan for this revision. (Show Details)
zbb added reviewers: skra, mmel, andrew.
zbb set the repository for this revision to rS FreeBSD src repository.
zbb added a subscriber: ARM.
mmel requested changes to this revision.Mar 2 2017, 5:15 PM
mmel edited edge metadata.

See Erratum 571620, 719331, 719332, 751473 and probably more others.

Right place for proper CPU setup (including recommended errata fixes) is bootloader, not a boot stage of OS.

Please take in account:

  • on many boards, OS starts in in non-secure mode with disabled access to ACTRL (Pandaboard is one example).
  • effect of NS write to protected ACTRL is not consistent across for Cortex family, some CPU's ignore writes, some generates exception (A8, and probably some older A9).
  • the current code writes to ACTRL only if bits absolutely necessary for FreeBSD run are not set. In this case, it's irrelevant if we crashes here, or few instruction later.
This revision now requires changes to proceed.Mar 2 2017, 5:15 PM
zbb abandoned this revision.Mar 3 2017, 2:17 PM
In D9864#203677, @meloun-miracle-cz wrote:

See Erratum 571620, 719331, 719332, 751473 and probably more others.
Right place for proper CPU setup (including recommended errata fixes) is bootloader, not a boot stage of OS.
Please take in account:

  • on many boards, OS starts in in non-secure mode with disabled access to ACTRL (Pandaboard is one example).
  • effect of NS write to protected ACTRL is not consistent across for Cortex family, some CPU's ignore writes, some generates exception (A8, and probably some older A9).
  • the current code writes to ACTRL only if bits absolutely necessary for FreeBSD run are not set. In this case, it's irrelevant if we crashes here, or few instruction later.

Thanks for the detailed explanation. Closing this revision then.

zbb added a comment.Mar 3 2017, 6:57 PM

@meloun-miracle-cz just to consider all possibillities. Do you have any idea whatsoever how this change could be inserted into the OS (in case there is no possibility to modify firmware), even in form of an option or platform-specific stuff?

andrew edited edge metadata.Mar 3 2017, 8:09 PM

There is the PLATFORM code to handle differences between different ARM based SoCs.

mmel added a comment.Mar 4 2017, 6:51 AM

Yep, fully agree. From my point of view, platform_late_init() (for BP) is best place for performance tweaks. At this point, printf() works and cpuinfo is initialized, so you can check proper CPU revision, print some warning about potentially dangerous action, etc...
For AP, you can use standard platform_mp_start_ap(), but many ACTRL bits are shared between CPU's. So, typically, it's sufficient to set up only BP.

Unfortunately, platform_late_init() is called to late for majority of errata fixups - in this case, i think that only real safe solution is bootloader fix.

zbb added a comment.Mar 6 2017, 10:54 AM
In D9864#204162, @meloun-miracle-cz wrote:

Yep, fully agree. From my point of view, platform_late_init() (for BP) is best place for performance tweaks. At this point, printf() works and cpuinfo is initialized, so you can check proper CPU revision, print some warning about potentially dangerous action, etc...
For AP, you can use standard platform_mp_start_ap(), but many ACTRL bits are shared between CPU's. So, typically, it's sufficient to set up only BP.
Unfortunately, platform_late_init() is called to late for majority of errata fixups - in this case, i think that only real safe solution is bootloader fix.

Thanks for your answers and suggestions.
I can see few problems here, please correct me if I'm wrong.

  1. We could modify ACTRL bits in the platform_late_init() but this applies to the primary CPU only. Other CPUs will not be affected.
  2. Secondary CPUs do not have their bits modified anywhere outside init_secondary(). platform_mp_start_ap() brings up secondary CPUs and is not executed by them.

Therefore the main place to modify ACTRL for all CPUs is still cpuinfo_get_actlr_modifier() + reinit_mmu().
Firmware cannot be modified ;-). That is why I'm looking for other options to have this change in the mainline FreeBSD.

zbb reclaimed this revision.Mar 6 2017, 10:55 AM
This revision now requires changes to proceed.Mar 6 2017, 10:55 AM
zbb added a comment.Mar 7 2017, 2:28 PM

BTW. How do you expect u-boot to turn this bit on if it always works in UP?

My suggestion at this point would be to turn on L1 prefetch after checking CPU revision to ensure that we don't drive into errata.
What is your opinion?

mmel added a comment.Mar 7 2017, 3:22 PM

I'm sorry about my platform_mp_start_ap() mistake.
I understand that cpuinfo_get_actlr_modifier() looks like most direct (and most easier) way how to make this. But again, this setting must be platform dependent.
Assume that you have 2 different boards (with different bootloaders) but with exactly same CPU. One board start OS in secure world, one in non-secure. . Then, write to ACTRL is OK for first one, but may cause exception on second.

IMO, we can add another platform method, say platform_late_init_ap(), called from here https://svnweb.freebsd.org/base/head/sys/arm/arm/mp_machdep.c?view=markup#l182.

Is this sufficient for you ?

zbb added a comment.Mar 7 2017, 3:44 PM
In D9864#204946, @meloun-miracle-cz wrote:

I'm sorry about my platform_mp_start_ap() mistake.
I understand that cpuinfo_get_actlr_modifier() looks like most direct (and most easier) way how to make this. But again, this setting must be platform dependent.
Assume that you have 2 different boards (with different bootloaders) but with exactly same CPU. One board start OS in secure world, one in non-secure. . Then, write to ACTRL is OK for first one, but may cause exception on second.
IMO, we can add another platform method, say platform_late_init_ap(), called from here https://svnweb.freebsd.org/base/head/sys/arm/arm/mp_machdep.c?view=markup#l182.
Is this sufficient for you ?

Hello. Thanks for your help with that.
In general, I don't insist on adding this change to all Cortex-A9 CPUs and I acknowledge your arguments.
Another platform-dependent method is OK for us. We need something that will be executed on both primary and secondary CPUs. It would be best if this stays in a common place for both primary and secondary CPUs (such as cpu_setup() is common for CPU0 and CPUx).
What do you think?

mmel added a comment.Mar 7 2017, 4:07 PM

In D9864#204948, @zbb wrote:
In general, I don't insist on adding this change to all Cortex-A9 CPUs and I acknowledge your arguments.
Another platform-dependent method is OK for us. We need something that will be executed on both primary and secondary CPUs. It would be best if this stays in a common place for both primary and secondary CPUs (such as cpu_setup() is common for CPU0 and CPUx).
What do you think?

Yep, I have no problem with this {if cpu_setup() is new platform method). I prefer to call this new function in the place where cpuinfo is initialized and printf is working.

zbb added a comment.Mar 7 2017, 6:04 PM
In D9864#204949, @meloun-miracle-cz wrote:

In D9864#204948, @zbb wrote:
In general, I don't insist on adding this change to all Cortex-A9 CPUs and I acknowledge your arguments.
Another platform-dependent method is OK for us. We need something that will be executed on both primary and secondary CPUs. It would be best if this stays in a common place for both primary and secondary CPUs (such as cpu_setup() is common for CPU0 and CPUx).
What do you think?

Yep, I have no problem with this {if cpu_setup() is new platform method). I prefer to call this new function in the place where cpuinfo is initialized and printf is working.

I have another idea. Please let me know your opinion.

  1. Instead of creating another platform method that will be used only by A38X, we will add SYSINIT entry to mv_machdep.c (ifdef ARMADA38X)
  2. This SYSINIT will be called after SMP is initialized (in case of multi-core kernel) and will set L1 prefetch bit for all CPUs using smp_rendezvous() function.
  3. We will comment add vast commentary to this sysinit, why we are doing this, etc.

What do you think? Is this a better idea or we should rather proceed with the new global platform method?

mmel added a comment.Mar 8 2017, 5:56 AM

Ahh, I see now, Marvell subtree is not fully converted to platform model :) . But trust me, conversion its not that hard and makes your life much easier.
Also, there is global effort to make single generic kernel for all supported armv6 boards, and #ifdef SoC only technique is in direct direct contradiction with this goal.
See allwinner subtree, mainly aw_machdep.c. This platform supports numerous board with single kernel in 'right' way, without #ifdef hell.

But anyway, because SYSINIT() based proposal doesn't 'collide' with other boards/platforms, I have no objections. Now platform method seems more flexible, but that's just my personal opinion.

zbb added a comment.Mar 8 2017, 10:21 AM
In D9864#205105, @meloun-miracle-cz wrote:

Ahh, I see now, Marvell subtree is not fully converted to platform model :) . But trust me, conversion its not that hard and makes your life much easier.
Also, there is global effort to make single generic kernel for all supported armv6 boards, and #ifdef SoC only technique is in direct direct contradiction with this goal.
See allwinner subtree, mainly aw_machdep.c. This platform supports numerous board with single kernel in 'right' way, without #ifdef hell.
But anyway, because SYSINIT() based proposal doesn't 'collide' with other boards/platforms, I have no objections. Now platform method seems more flexible, but that's just my personal opinion.

Armada stuff provides platform_late_init, etc. but the problem is that in order to set up this bit on all CPUs we need to call platform method on each of them. We would need to add another platform_<something>_init for all platforms and call it from both primary and secondary CPUs (whereas platform_late_init is called only on the primary CPU). That is why I suggested SYSINIT as it sounds equally hackish and yet less expansive.

mmel added a comment.Mar 8 2017, 11:29 AM
In D9864#205185, @zbb wrote:

Armada stuff provides platform_late_init, etc.

platform_late_init() is only transitional hack, full platform model needs FDT_PLATFORM_DEF() macro.

In D9864#205185, @zbb wrote:

but the problem is that in order to set up this bit on all CPUs we need to call platform method on each of them. We would need to add another platform_<something>_init for all platforms and call it from both primary and secondary CPUs (whereas platform_late_init is called only on the primary CPU). That is why I suggested SYSINIT as it sounds equally hackish and yet less expansive.

That's why I pointed to aw_machdep.c. Look to: https://svnweb.freebsd.org/base/head/sys/arm/allwinner/aw_machdep.c?revision=311237&view=markup#l167
With this style, you can mix 'platform specific' and 'platform universal' functions, you can save platform/cpu type to variable and then use it at runtime....

And with combination of new platform_late_init_ap() method (defaulting to nop) (see: https://svnweb.freebsd.org/base/head/sys/arm/arm/platform_if.m?limit_changes=0&view=markup#l57 for example), everything whats you need is single function (cpu_init)( from above) declared in appropriate platform methods table (assuming that same function can be called directly from platform_late_init()).

zbb updated this revision to Diff 28247.May 11 2017, 4:01 PM
zbb edited edge metadata.

Reworked to use platform call.

andrew added inline comments.May 11 2017, 6:12 PM
sys/arm/mv/mv_machdep.c
263

You could have this use cp15_actlr_get, cp15_actlr_get in the platform_late_init handler (with my suggestion in D10682).

zbb added inline comments.May 12 2017, 10:29 AM
sys/arm/mv/mv_machdep.c
263

We could use cp15_actlr_get + cp15_actlr_get' but we would need to explicitly ensure that platform_cpu_init` or any future equivalent is called after reinit_mmu which modifies the default settings of ACTLR and exchange them to the default hard-coded ones.
So long story-short. We need to call this before reinit_mmu so that we can change mapping types and we need to call this after reinit_mmu to be able to use actlr_get/set sequence. This makes no sense.

I would rather assume that we call this before the generic settings are applied as the generic settings close the windows for modifications.

andrew added inline comments.May 12 2017, 10:40 AM
sys/arm/mv/mv_machdep.c
263

I don't see this as a problem as we already need to do this to ensure we don't change the existing fields, other than the fields we wish to change.

It would also be nice if we could have macros for the magic bits.

zbb added inline comments.May 12 2017, 11:48 AM
sys/arm/mv/mv_machdep.c
263

I didn't create a macro as there are no macros for other bits in the existing ACTLR modifier, just the description/commentary.

mmel added a comment.EditedMay 14 2017, 7:55 AM

I would prefer to move this code to platform_late_init().
The only drawback is that you must use cp15_actlr_set() + cp15_actlr_get() for boot CPU...

sys/arm/mv/mv_machdep.c
263

ACTRL differs per Cortex class, and many bits are not defined at all (but these are used in ARM errata). So I think that hardcoded value with comment is bit a more readable that one time used define.

zbb added a comment.May 15 2017, 11:44 AM
In D9864#222185, @meloun-miracle-cz wrote:

I would prefer to move this code to platform_late_init().
The only drawback is that you must use cp15_actlr_set() + cp15_actlr_get() for boot CPU...

Let's get to consensus on that.
Do you prefer to drop this commit and use cp15_actlr_get + cp15_actlr_set?

zbb abandoned this revision.Jun 16 2017, 1:48 PM

Abandon. Will use environment variables.