- Move to per-CPU overflow entries, since we model these as 32-bit extensions
to 32-bit per-CPU registers.
- Attempt to address a race between overflow counting and overflows, which
may occur (for example) if the overflow happens while interrupts are off
during context switching.
With these changes, I no longer experience panics when using per-process
counters on a multithreaded process.