Page MenuHomeFreeBSD

Stop performing a full icache sync when the DIC and IDC flags are set
ClosedPublic

Authored by andrew on May 15 2020, 3:09 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Oct 28, 7:17 PM
Unknown Object (File)
Sep 24 2024, 2:52 AM
Unknown Object (File)
Sep 19 2024, 5:43 PM
Unknown Object (File)
Sep 19 2024, 3:22 PM
Unknown Object (File)
Sep 19 2024, 1:42 AM
Unknown Object (File)
Sep 18 2024, 7:36 PM
Unknown Object (File)
Sep 16 2024, 10:14 AM
Unknown Object (File)
Sep 11 2024, 10:39 PM
Subscribers

Details

Summary

The DIC and IDC bits in the CTR_EL0 register signal to the kernel when it
can relax the instruction cache synchronisation operations. The IDC bit
means we can relax cleaning the data cache to the point of unification
while the DIC bit means we don't need to invalidate the instruction cache
for data coherence. In both cases an appropriate barrier is still needed.

For now only implement the case where both bits are set, as is the case
on the Neoverse N1 as used in the Amazon AWS Graviton 2 CPU. Note that
this behaviour is a optional on the N1 so we may later need to implement
only one or the other bit being set.

There is a tunable to disable each flag on boot.

Test Plan

Boot on various Amazon M6g (Graviton2 based) instances.
On a 64-core instance I observed a 10s reduction in buildworld times with
a previous version of this patch (only one build with and without the change).

I timed 10 make buildkernel -j64 runs with an empty obj. I ignored the first
tun as it was used to warm the cache. The user and real times showed no
significant change. The sys time shows the following improvement from an
earlier version of this patch. isync.sys is a full icache sync, no_isync
is with just a dsb & isb.

x isync.sys
+ no_isync.sys
+------------------------------------------------------------------------------+
|                     +                           x                            |
|+           + +      +            +    x  x    x *    ++x    x            x  x|
|         |___________M_______A___________|_______|_____A____________|         |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   9        151.36        167.08        155.41     157.94778     5.6033289
+   9        135.34        158.05        144.11     147.28333     8.2816167
Difference at 95.0% confidence
        -10.6644 +/- 7.06605
        -6.75188% +/- 4.38105%
        (Student's t, pooled s = 7.07045)

Running make buildworld -j4 on a 4 core M6g instance with a clean obj 10 times (after running once and ignoring the result to prime caches) I get the following results. isync is with the sysctls set to 0, no_isync is with them set to 1.

x isync.user
+ no_isync.user
+------------------------------------------------------------------------------+
|x *           x xx     *         ++ +   +      +    x   ++ x      +          x|
| |_______________M___|_____A__________MA_____________|____|                   |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10       1660.01       1662.96      1660.655      1661.061    0.99780259
+  10       1660.07       1662.52      1661.455      1661.516    0.71498562
No difference proven at 95.0% confidence

x isync.sys
+ no_isync.sys
+------------------------------------------------------------------------------+
|+    +    + +  +    +  ++ + +                  x          x    x x   x  x   xx|
|       |________AM________|                              |_________A________| |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10          95.5         97.68         96.95        96.947    0.67774217
+  10         92.13         94.15         93.39        93.305    0.67871202
Difference at 95.0% confidence
        -3.642 +/- 0.637259
        -3.75669% +/- 0.645117%
        (Student's t, pooled s = 0.678227)

x isync.real
+ no_isync.real
+------------------------------------------------------------------------------+
|+ +            +        + +x +x  + x  x  + +x +                     xx  x   xx|
|          |_______________AM_____|________|___________A_M_________________|   |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10        445.98        447.92         447.1       447.003    0.79998681
+  10        444.91        446.69       445.985        445.92       0.63177
Difference at 95.0% confidence
        -1.083 +/- 0.677263
        -0.24228% +/- 0.151286%
        (Student's t, pooled s = 0.720802)

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 31100
Build 28790: arc lint + arc unit

Event Timeline

Remove CTLFLAG_NOFETCH that was left over from a copy & paste

sys/arm64/arm64/identcpu.c
67

Do you propose enabling these by default?

  • Enable by default
  • Clean up style a little
This revision is now accepted and ready to land.May 19 2020, 3:47 PM