This change adds support for transparent superpages for PowerPC64
systems using Hashed Page Tables (HPT). All pmap operations are
supported.
The changes were inspired by RISC-V implementation of superpages,
by @markj (r344106), but heavily adapted to fit PPC64 HPT architecture
and existing MMU OEA64 code.
While these changes are not better tested, superpages support is disabled by default.
To enable it, use `vm.pmap.pg_ps_enabled=1`.
The focus of this initial implementation is on correctness, thus, performance is not very
good (yet). Below are the buildworld times of a POWER8 machine with 32GB RAM, with
CURRENT kernel (r362045) using GENERIC config:
```
* Without D25237:
>>> World built in 6031 seconds, ncpu: 80, make -j32
* With D25237 and vm.pmap.pg_ps_enabled=1:
>>> World built in 6183 seconds, ncpu: 80, make -j32
Slowdown: ~2.5%
* With D25237 and vm.pmap.pg_ps_enabled=0:
>>> World built in 6105 seconds, ncpu: 80, make -j32
Slowdown: ~1.2%
```
Despite the current performance overhead on buildworld, some workloads already show a significant performance boost, mainly those that make heavy use of the TLB.
An example is the RandomAccess test from HPC Challenge, that performs several random accesses to a large memory area. With superpages enabled, a 60% boost on a POWER8 machine and 23% on Talos was measured.
Database programs are also said to benefit from superpages. Running pgbench showed about 5% boost on POWER8 and 8.4% on Talos, when taking the average TPS (transactions per second) from 10 select-only runs of 5 seconds (pgbench -S -T 5). When running for several seconds or together with updates, the disk access time ends up dominating and the gains dissipate (pgbench was run on a test database with scale factor 150, with a single thread and client, to minimize other sources of inefficiency, but the size of the database was probably not big enough to take full advantage of superpages).