This was motivated by looking at vm.pmap.kernel_pmaps during poudriere
runs. I see many runs of 511 4KB mappings. Since UMA uses the direct
map for page-sized slabs, most allocations into kernel_object are > 4KB,
so we end up with page-sized holes, inhibiting superpage promotion and
causing fragmentation since kernel_object reservations remain partially
populated. For example:
```
0xfffffe0215200000-0xfffffe02157ff000 rw-s- WB 0 2 511
0xfffffe0215800000-0xfffffe0215dff000 rw-s- WB 0 2 511
0xfffffe0215e00000-0xfffffe02163ff000 rw-s- WB 0 2 511
0xfffffe0216400000-0xfffffe02165ff000 rw-s- WB 0 0 511
0xfffffe0216600000-0xfffffe02167ff000 rw-s- WB 0 0 511
0xfffffe0216800000-0xfffffe02169ff000 rw-s- WB 0 0 511
0xfffffe0216a00000-0xfffffe0216dff000 rw-s- WB 0 1 511
0xfffffe0216e00000-0xfffffe02175ff000 rw-s- WB 0 3 511
```
I tried measuring 2MB mapping usage within the kernel map during the
first few minutes of a poudriere run.
Before: https://reviews.freebsd.org/P378
After: https://reviews.freebsd.org/P379
There are some other approaches that would also help:
- Use a larger import quantum on platforms where KVA is cheap
- Use the per-domain arenas to manage physical memory instead of KVA
The second would avoid creation of holes, but we'd still have internal
fragmentation due to the rarity of 4KB allocations. Coalescing across 2MB
boundaries would also be less likely to occur, and we would want some
mechanism to reclaim memory from the arenas during a severe shortage.
I still see a number of holes even with the patch applied, I'm not yet sure
why. It might be that something is occasionally allocating and freeing 4KB
of memory using kmem_malloc().