MFC r349003, r349031, r349042, r349129, r349290, r349618, r349798
Change pmap_demote_l2_locked() so that it removes the superpage mapping on a demotion failure. Otherwise, some callers to pmap_demote_l2_locked(), such as pmap_protect(), may leave an incorrect mapping in place on a demotion failure. Change pmap_demote_l2_locked() so that it handles addresses that are not superpage aligned. Some callers to pmap_demote_l2_locked(), such as pmap_protect(), may not pass a superpage aligned address. Optimize TLB invalidation in pmap_remove_l2(). Change the arm64 pmap so that updates to the global count of wired pages are not performed directly by the pmap. Instead, they are performed by vm_page_free_pages_toq(). Batch the TLB invalidations that are performed by pmap_protect() rather than performing them one at a time. Eliminate a redundant call to pmap_invalidate_page() from pmap_ts_referenced(). Introduce pmap_remove_l3_range() and use it in two places: (1) pmap_remove(), where it eliminates redundant TLB invalidations by pmap_remove() and pmap_remove_l3(), and (2) pmap_enter_l2(), where it may optimize the TLB invalidations by batching them. Implement pmap_copy(). Three changes to pmap_enter(): 1. Use _pmap_alloc_l3() instead of pmap_alloc_l3() in order to handle the possibility that a superpage mapping for "va" was created while we slept. 2. Eliminate code for allocating kernel page table pages. Kernel page table pages are preallocated by pmap_growkernel(). 3. Eliminate duplicated unlock operations when KERN_RESOURCE_SHORTAGE is returned.