With some measures in place to reduce free page and page queue lock
contention, we see a lot of kernel map lock contention during
highly parallel builds. This turned out to be the result of parallel
calls to exec_alloc_args(), which allocates a large KVA range that
is released at the end of the execve call. We currently always
reserve 16 * 260KB for this purpose in a submap. This is excessive on
small systems and not enough on large systems. Moreover, exec_map is
used mainly for execve arguments, meaning that each allocation is the
same size, so the overhead of maintaining the vm_map splay tree is
mostly unnecessary.
This change modifies the exec_map sizing policy so that it depends on
the number of CPUs. It also allows an execve argument range to be cached
per CPU, letting execve avoid the exec_map sx lock in the common case.