This was motivated by looking at vm.pmap.kernel_pmaps during poudriere
runs. I see many runs of 511 4KB mappings. Since UMA uses the direct
map for page-sized slabs, most allocations into kernel_object are > 4KB,
so we end up with page-sized holes, inhibiting superpage promotion and
causing fragmentation since kernel_object reservations remain partially
populated. For example:
```
0xfffffe0215200000-0xfffffe02157ff000 rw-s- WB 0 2 511
0xfffffe0215800000-0xfffffe0215dff000 rw-s- WB 0 2 511
0xfffffe0215e00000-0xfffffe02163ff000 rw-s- WB 0 2 511
0xfffffe0216400000-0xfffffe02165ff000 rw-s- WB 0 0 511
0xfffffe0216600000-0xfffffe02167ff000 rw-s- WB 0 0 511
0xfffffe0216800000-0xfffffe02169ff000 rw-s- WB 0 0 511
0xfffffe0216a00000-0xfffffe0216dff000 rw-s- WB 0 1 511
0xfffffe0216e00000-0xfffffe02175ff000 rw-s- WB 0 3 511
```
I tried measuring 2MB mapping usage within the kernel map during the
first few minutes of a poudriere run.
Before: https://reviews.freebsd.org/P378
After: https://reviews.freebsd.org/P379
There are some other approaches that would also help:
- Use a larger import quantum on platforms where KVA is cheap
- Use the per-domain arenas to manage physical memory instead of KVA
The second would avoid creation of holes, but we'd still have internal
fragmentation due to the lack of 4KB allocations.
I still see a number of holes even with the patch applied, I'm not yet sure
why. It might be that something is occasionally allocating and freeing 4KB
of memory using kmem_malloc().