The main trick to save memory in the VM_PHYSSEG_DENSE case is to try to
allocate the VM page array (the 'struct vm_page' array representing all
VM pages, named 'vm_page_array') at some boundary of the single physical
address span, so that the array does not have to contain 'struct
vm_page' corresponding to its own storage.  On amd64, it would allow to
save at least 0.064% of the total amount of physical memory (e.g.,
~44.3MB on a 64GB machine).  See comments for more details.
Prior to this commit, that trick was applied only if there was enough
memory available in the last chunk (highest addresses) and if
PMAP_HAS_PAGE_ARRAY was not defined.  PMAP_HAS_PAGE_ARRAY is
a per-architecture knob that enables the pmap itself to build the
'struct vm_page' array, with the aim to allocate pages backing this
array in the same NUMA domain as the pages that are backed.  By
construction, if there are multiple domains (at least 3, or only 2 but
with the first and last physical chunks belonging to the same one), the
trick cannot apply as is because pages of the array are then allocated
in different physical chunks corresponding to the different domains.
Architectures currently defining/implementing PMAP_HAS_PAGE_ARRAY are
amd64 and powerpc.  So, basically, the space saving trick has not been
applied to these architectures since the introduction of
PMAP_HAS_PAGE_ARRAY.
This commit introduces the following enhancements:
- Even if PMAP_HAS_PAGE_ARRAY is defined, the trick is applied if there is only one VM domain, in which case pmap_page_array_startup() is not called.
- Early allocation of memory (before the allocation of the VM page array) now systematically happens at the boundaries of the available physical memory, as this allows to reduce the final span of physical memory, and thus the size of VM page array (a sizeof(struct vm_page)/PAGE_SIZE ratio of the memory removed from the single span by this change is saved; e.g., on amd64, around 2.5% (104/4096)).
- The VM page array can be allocated at start of the physical memory span instead of the end, increasing chances of the trick working (although probably not on amd64, as the first chunk seems to be in general too small to contain the VM page array). As before, the VM page array is allocated at once, so must fit entirely in either the first or last chunk so that the trick can apply (this could be improved).
While here, simplify vm_page_startup() by moving all the VM page array
allocation logic into vm_page_array_alloc(), and the dump pages' bitmap
allocation before that of witness pages, allowing to register witness
pages to be dumped just after they have been allocated (and to stop
keeping some witness state in function-scope variables).
While here, as pmap_page_array_startup() is now called only when there
are at least two domains (i.e., the machine is really NUMA), replace
code now dead in moea64_page_array_startup() with an assertion on the
number of domains.