Page MenuHomeFreeBSD

amd64 pmap: enable the use of INVPCID_CTXGLOB on AMD Ryzen processors
ClosedPublic

Authored by alc on Jul 26 2025, 9:05 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Oct 10, 5:49 PM
Unknown Object (File)
Fri, Oct 10, 5:49 PM
Unknown Object (File)
Fri, Oct 10, 5:49 PM
Unknown Object (File)
Fri, Oct 10, 5:49 PM
Unknown Object (File)
Fri, Oct 10, 12:48 PM
Unknown Object (File)
Mon, Sep 22, 1:10 AM
Unknown Object (File)
Sat, Sep 20, 10:57 PM
Unknown Object (File)
Thu, Sep 18, 8:55 PM
Subscribers

Details

Summary

Recent AMD Ryzen processors support a limited form of the invpcid instruction, even when they do not support PCID functionality. In particular, they support the type 2 form of the instruction, what we call INVPCID_CTXGLOB. This is supposedly faster than toggling PGE in cr4.

Test Plan

We don't perform pmap_invalidate_all(kernel_pmap) often, so I reduced PMAP_INVLPG_THRESHOLD to 33 so that it would get exercised more.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

alc requested review of this revision.Jul 26 2025, 9:05 PM
alc created this revision.
sys/amd64/amd64/pmap.c
3430–3432

Should we make this the target of the DEFINE_IFUNC instead of pmap_invalidate_all_cb()? It would eliminate the following indirection:

ffffffff8105a6c0 <pmap_invalidate_all_curcpu_cb>:
ffffffff8105a6c0: 55                    pushq   %rbp
ffffffff8105a6c1: 48 89 e5              movq    %rsp, %rbp
ffffffff8105a6c4: 5d                    popq    %rbp
ffffffff8105a6c5: e9 00 00 00 00        jmp     0xffffffff8105a6ca <pmap_invalidate_all_curcpu_cb+0xa>
ffffffff8105a6ca: 66 0f 1f 44 00 00     nopw    (%rax,%rax)
sys/amd64/amd64/pmap.c
3392

If we have INVPCID, might be it is better to do INVPCID(CTX) (or INVPCID(ALLCTX), should be same) there instead of reloading %cr3? It might make the life better for page table accesses snopping hw.

3430–3432

No objections.

sys/amd64/amd64/pmap.c
3392

There is an old thread on the Linux kernel mailing list from the Skylake era (2016) where they claim that writing to cr3 was surprisingly faster. Unless I am misreading their code, they appear to have stuck with that approach to this day.

alc marked an inline comment as done.Jul 27 2025, 5:08 PM
kib added inline comments.
sys/amd64/amd64/pmap.c
3392

That would be used on Zen 4/5 AFAIU, not Skylake machines. But ok.

This revision is now accepted and ready to land.Jul 27 2025, 5:17 PM

Eliminate unnecessary indirection.

This revision now requires review to proceed.Jul 27 2025, 9:06 PM
alc marked an inline comment as done.Jul 27 2025, 9:16 PM
alc added inline comments.
sys/amd64/amd64/pmap.c
3392

Yes, but I have not found any performance claims about whether invpcid or writing to cr3 is faster on Zen 3/4/5.

This revision is now accepted and ready to land.Jul 27 2025, 9:21 PM
This revision was automatically updated to reflect the committed changes.
alc marked an inline comment as done.