Add a missing data barrier before invalidating the tlb and relax the
barrier afterwards. This ensures all stores to the page tables have
completed, and no loads or stores afterwards are moved before the tlbi.
While here move an instruction barrier after all msr instructions in
start_mmu. These may not complete until the isb so it's safer to have
it complete before invalidating the TLB.
Sponsored by: Arm Ltd