With r311346 we apply MADV_FREE upon every execve. Despite the recent
changes to pmap_advise() and vm_object_madvise(), this still has a large
amount of overhead on many-CPU systems, primarily because
pmap_advise(MADV_FREE) clears dirty bits and thus requires a TLB
shootdown on x86. On x1.32xlarge EC2 instance (128 vCPUs), the removal of this
overhead gives a ~7.5% reduction in wall time for a -j128 buildkernel, and
nearly a 50% reduction in system CPU time.
To avoid this overhead, use a lowmem handler to move exec args pages
close to the head of the inactive queue prior to an inactive queue scan.
A generation counter is added to track lowmem calls; when freeing exec
args, a pending generation count will cause MADV_FREE to be applied.
This ensures that all but 260KB*ncpu worth of memory will be reclaimed,
and remaining pages will likely be reclaimed upon a subsequent scan.
Given the overhead of applying MADV_FREE, this seems to be a better
tradeoff. (The EC2 instance type in question provides almost 2TB of
RAM.)