According to SDM rev. 69 vol. 3, for PDPTE registers loads:
- when PAT is not supported, access to the pdir page is performed as UC, see 4.9.1;
- when PAT is supported, the access is WB, see 4.9.2.
So potentially CPU might load stale memory as PDPTEs if both PAT and self-snoop are not implemented. To be safe, add total local cache flush to pmap_cold() before initial load of cr3, and flush pdir in pmap_pinit(), if PAT is not implemented.
PS. Due to a note in 4.9.2 I was tempted to always flush pdir, but then backed it off.