Only flush bp_kernload from the dcache, no need to sync the icache on the boot CPU.
syncicache() only syncs the icache on the current CPU, it doesn't touch the
cache on any other core. Replace the call with cpu_flush_dcache() instead.
Since bp_kernload is not touched again by the boot CPU in this code path, dcbf
is no less efficient than the dcbst from syncicache() by invalidating the
cache line.