In order to reduce the number of "dsb ish" operations, restructure cache_handle_range so that all of the "dc" operations are performed before any "ic" operation. Then, as I interpret the ARM ARM, a single "dsb ish" between the "dc" and "ic" operations and a single "dsb ish" after the "ic" operations suffice.
Linux 5.1.16's flush_icache_range() in arch/arm64/mm/cache.S and its macro invalidate_icache_by_line in arch/arm64/include/asm/assembler.h are similarly structured.