Apparently the SRAT may contain multiple contiguous segments as separate
entries. For example:
SRAT: Found memory domain 0 addr 0x0 len 0x80000000: enabled SRAT: Found memory domain 0 addr 0x100000000 len 0x5f70000000: enabled SRAT: Found memory domain 1 addr 0x6070000000 len 0x3000000000: enabled SRAT: Found memory domain 1 addr 0x9070000000 len 0x2000000000: enabled SRAT: Found memory domain 1 addr 0xb070000000 len 0x800000000: enabled SRAT: Found memory domain 1 addr 0xb870000000 len 0x400000000: enabled
Currently this results in multiple contiguous segments in the affinity
table built from the SRAT, because the SRAT parser assumes that SRAT
entries are already coalesced when possible.
vm_phys_early_startup() uses the affinity table to carve up
phys_avail[] so that each entry is contained in a single domain.
However, when the affinity table contains multiple contiguous entries it
will also result in multiple contiguous phys_avail[] entries.
Then, vm_phys segments are created from phys_avail[] entries. They are
coalesced since r338431, so we end up with vm_phys_segs[] entries that
span multiple phys_avail[] entries.
Finally, at the end of vm_page_startup() we add vm_pages to the vm_phys
freelists. We add a range for each vm_phys_seg[] entry for which there
is a covering phys_avail[] entry. But, the fragmentation of
phys_avail[] entries described above means that we may not add some
segments to the vm_phys allocator, and as a result the system leaves
large amounts of RAM unused.
Fix the problem by ensuring that contiguous entries in the memory
affinity table are coalesced. I think we could instead change
vm_page_startup() to call vm_phys_enqueue_contig() on all subranges
covered by phys_avail[] entry, and that would solve the problem too.