Page MenuHomeFreeBSD

Restructure cache_handle_range to avoid unnecessary barriers
ClosedPublic

Authored by alc on Jul 4 2019, 3:00 AM.
Tags
None
Referenced Files
F107178410: D20848.diff
Sat, Jan 11, 7:40 AM
Unknown Object (File)
Mon, Jan 6, 9:09 AM
Unknown Object (File)
Tue, Dec 24, 6:47 AM
Unknown Object (File)
Sep 30 2024, 3:55 PM
Unknown Object (File)
Sep 11 2024, 12:49 AM
Unknown Object (File)
Sep 9 2024, 12:18 PM
Unknown Object (File)
Aug 22 2024, 12:23 AM
Unknown Object (File)
Aug 21 2024, 8:11 PM

Details

Summary

In order to reduce the number of "dsb ish" operations, restructure cache_handle_range so that all of the "dc" operations are performed before any "ic" operation. Then, as I interpret the ARM ARM, a single "dsb ish" between the "dc" and "ic" operations and a single "dsb ish" after the "ic" operations suffice.

Linux 5.1.16's flush_icache_range() in arch/arm64/mm/cache.S and its macro invalidate_icache_by_line in arch/arm64/include/asm/assembler.h are similarly structured.

Test Plan

With this change, the time that it takes to do a "make -j8 buildworld" on an Amazon EC2 a1.2xlarge (8 cores, 16 GB RAM) is reduced by 9%.

Since the a1.2xlarge cores are Cortex-A72's with PIPT L1 I-Caches, I also tested this patch with arm64_icache_sync_range()'s implementation changed from

ENTRY(arm64_icache_sync_range)
        /*                                                                                                                                                                          
         * XXX Temporary solution - I-cache flush should be range based for                                                                                                         
         * PIPT cache or IALLUIS for VIVT or VIPT caches                                                                                                                            
         */
/*      cache_handle_range      dcop = cvau, ic = 1, icop = ivau */
        cache_handle_range      dcop = cvau
        ic      ialluis
        dsb     ish
        isb
        ret
END(arm64_icache_sync_range)

to

ENTRY(arm64_icache_sync_range)
        cache_handle_range      dcop = cvau, ic = 1, icop = ivau
        ret
END(arm64_icache_sync_range)

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Jul 4 2019, 1:31 PM

Just an FYI, when I modified arm64_icache_sync_range() to do the range-based "ic" operations, instead of flushing the entire L1 I-cache, performance got very slightly worse. I was somewhat surprised by this result.

My guess is we are hitting the crossing point where the instruction cost of the loop is greater than the cost of invalidating the entire instruction cache. We might need to have an invalidate all version of arm64_icache_sync_range then switch between them in the macro based on the length of the memory.