Page MenuHomeFreeBSD

amd64 pmap: enable the use of INVPCID_CTXGLOB on AMD Ryzen processors
ClosedPublic

Authored by alc on Jul 26 2025, 9:05 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Oct 24, 1:24 PM
Unknown Object (File)
Fri, Oct 24, 1:53 AM
Unknown Object (File)
Wed, Oct 22, 9:29 PM
Unknown Object (File)
Fri, Oct 17, 11:07 PM
Unknown Object (File)
Oct 10 2025, 5:49 PM
Unknown Object (File)
Oct 10 2025, 5:49 PM
Unknown Object (File)
Oct 10 2025, 5:49 PM
Unknown Object (File)
Oct 10 2025, 5:49 PM
Subscribers

Details

Summary

Recent AMD Ryzen processors support a limited form of the invpcid instruction, even when they do not support PCID functionality. In particular, they support the type 2 form of the instruction, what we call INVPCID_CTXGLOB. This is supposedly faster than toggling PGE in cr4.

Test Plan

We don't perform pmap_invalidate_all(kernel_pmap) often, so I reduced PMAP_INVLPG_THRESHOLD to 33 so that it would get exercised more.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

alc requested review of this revision.Jul 26 2025, 9:05 PM
alc created this revision.
sys/amd64/amd64/pmap.c
3430–3432

Should we make this the target of the DEFINE_IFUNC instead of pmap_invalidate_all_cb()? It would eliminate the following indirection:

ffffffff8105a6c0 <pmap_invalidate_all_curcpu_cb>:
ffffffff8105a6c0: 55                    pushq   %rbp
ffffffff8105a6c1: 48 89 e5              movq    %rsp, %rbp
ffffffff8105a6c4: 5d                    popq    %rbp
ffffffff8105a6c5: e9 00 00 00 00        jmp     0xffffffff8105a6ca <pmap_invalidate_all_curcpu_cb+0xa>
ffffffff8105a6ca: 66 0f 1f 44 00 00     nopw    (%rax,%rax)
sys/amd64/amd64/pmap.c
3392

If we have INVPCID, might be it is better to do INVPCID(CTX) (or INVPCID(ALLCTX), should be same) there instead of reloading %cr3? It might make the life better for page table accesses snopping hw.

3430–3432

No objections.

sys/amd64/amd64/pmap.c
3392

There is an old thread on the Linux kernel mailing list from the Skylake era (2016) where they claim that writing to cr3 was surprisingly faster. Unless I am misreading their code, they appear to have stuck with that approach to this day.

alc marked an inline comment as done.Jul 27 2025, 5:08 PM
kib added inline comments.
sys/amd64/amd64/pmap.c
3392

That would be used on Zen 4/5 AFAIU, not Skylake machines. But ok.

This revision is now accepted and ready to land.Jul 27 2025, 5:17 PM

Eliminate unnecessary indirection.

This revision now requires review to proceed.Jul 27 2025, 9:06 PM
alc marked an inline comment as done.Jul 27 2025, 9:16 PM
alc added inline comments.
sys/amd64/amd64/pmap.c
3392

Yes, but I have not found any performance claims about whether invpcid or writing to cr3 is faster on Zen 3/4/5.

This revision is now accepted and ready to land.Jul 27 2025, 9:21 PM
This revision was automatically updated to reflect the committed changes.
alc marked an inline comment as done.