Introduce pmap_store(), and use it to replace pmap_load_store() in places where the page table entry was previously invalid. (Note that I did not replace pmap_load_store() when it was followed by a TLB invalidation, even if we are not using the return value from pmap_load_store().)
In pmap_enter_l2(), when replacing an empty kernel page table page by a superpage mapping, clear the old l2 entry and issue a TLB invalidation. My reading of the architecture manual leads me to believe that the TLB could hold an intermediate entry referencing the empty kernel page table page even though it contains no valid mappings.
Replace a couple direct uses of atomic_clear_64() by the new pmap_clear_bits().
In a couple comments, replace the term "paging-structure caches", which is an Intel-specific terminology, with wording that is more consistent with the ARM architecture manual.