Recent AMD Ryzen processors support a limited form of the invpcid instruction, even when they do not support PCID functionality. In particular, they support the type 2 form of the instruction, what we call INVPCID_CTXGLOB. This is supposedly faster than toggling PGE in cr4.
Details
Details
- Reviewers
kib markj - Commits
- rG6fb848f2ff91: amd64 pmap: Use INVPCID_CTXGLOB on Ryzen processors
We don't perform pmap_invalidate_all(kernel_pmap) often, so I reduced PMAP_INVLPG_THRESHOLD to 33 so that it would get exercised more.
Diff Detail
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
sys/amd64/amd64/pmap.c | ||
---|---|---|
3430–3432 | Should we make this the target of the DEFINE_IFUNC instead of pmap_invalidate_all_cb()? It would eliminate the following indirection: ffffffff8105a6c0 <pmap_invalidate_all_curcpu_cb>: ffffffff8105a6c0: 55 pushq %rbp ffffffff8105a6c1: 48 89 e5 movq %rsp, %rbp ffffffff8105a6c4: 5d popq %rbp ffffffff8105a6c5: e9 00 00 00 00 jmp 0xffffffff8105a6ca <pmap_invalidate_all_curcpu_cb+0xa> ffffffff8105a6ca: 66 0f 1f 44 00 00 nopw (%rax,%rax) |
sys/amd64/amd64/pmap.c | ||
---|---|---|
3392 | There is an old thread on the Linux kernel mailing list from the Skylake era (2016) where they claim that writing to cr3 was surprisingly faster. Unless I am misreading their code, they appear to have stuck with that approach to this day. |
sys/amd64/amd64/pmap.c | ||
---|---|---|
3392 | That would be used on Zen 4/5 AFAIU, not Skylake machines. But ok. |
sys/amd64/amd64/pmap.c | ||
---|---|---|
3392 | Yes, but I have not found any performance claims about whether invpcid or writing to cr3 is faster on Zen 3/4/5. |