This gives us lower latency and lower interconnect bandwidth on many benchmarks that process a lot of pages. We probably should also affinitize the vm threads after this for maximum benefit.
I used a separate pml4 entry because it made life easier for me. There isn't a strong technical reason to do so. Given that the page array is 3% of memory I don't think this is an unreasonable use. There is a small downside that you can't skip allocating pages for the page array since it is not at the very end of memory anymore. This creates a waste of 3% of 3%. or .1% of memory. I find this acceptable.