The program which runs a thread on each core, writes some values in bases and expect them to be read back both by instructions and by sysarch(2) was used as additional load for stress2. See https://gist.github.com/23f6b30e72e6bc8738196fc6c7e2e39c.
hwmpc(4) was used to generate NMIs.