Page MenuHomeFreeBSD

amd64: for small cores, use (big hammer) INVPCID_CTXGLOB instead of INVLPG
ClosedPublic

Authored by kib on Dec 21 2022, 11:27 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Dec 3, 4:59 AM
Unknown Object (File)
Mon, Nov 25, 6:43 PM
Unknown Object (File)
Sat, Nov 23, 10:51 AM
Unknown Object (File)
Fri, Nov 22, 3:01 PM
Unknown Object (File)
Thu, Nov 21, 9:27 AM
Unknown Object (File)
Tue, Nov 19, 11:55 PM
Unknown Object (File)
Tue, Nov 19, 10:09 PM
Unknown Object (File)
Tue, Nov 19, 11:27 AM

Details

Summary

A hypothetical CPU bug makes invalidation of global PTEs using INVLPG in pcid mode unreliable, it seems. The workaround is applied for all CPUs with small cores, since we do not know the scope of the issue, and the right fix.

PR: 261169, 266145

Also:
amd64: identify small cores

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib requested review of this revision.Dec 21 2022, 11:27 AM
sys/amd64/amd64/initcpu.c
324

Do you want to add macros for these constants? Identifying the E-cores will be useful even without this workaround, e.g. for the scheduler.

sys/amd64/amd64/initcpu.c
326

Perhaps add a short comment explaining what is being worked around here. Hopefully this workaround will be deleted soon, but it is worth describing.

sys/amd64/include/pmap.h
519 ↗(On Diff #114353)
pmap_invlpg(pmap_t pmap, vm_offset_t)
{
    if (pmap != kernel_pmap)
        return;
    ...
}

feels more intuitive. The compiler should be able to remove the branch in most callers.

Otherwise, it is not obvious to me what the "_pmap" suffix means.

kib marked 3 inline comments as done.Dec 21 2022, 5:27 PM
kib added inline comments.
sys/amd64/amd64/initcpu.c
324

The identification of small cores is recorded into pcpu data, so it is there. Still added constants.

326

Added a comment to pmap_invlpg().

sys/amd64/include/pmap.h
519 ↗(On Diff #114353)

Ok, I did it almost this way, but the function still needs to perform invalidation, so it cannot just return for !kernel_pmap.

kib marked 3 inline comments as done.

pmap_invlpg(), commented.
Add CPUID HYBRID leaf constants.

Do you have any guess as to how long we might be stuck with this workaround? If the need for it persists, then we may want to have a wrapper to optimize the range operations to avoid repeating INVPCID_CTXGLOB.

sys/amd64/amd64/pmap.c
10433 ↗(On Diff #114371)

pmap_quick_enter_page() doesn't set PG_G, so strictly speaking this case doesn't need to change.

sys/amd64/include/pcpu.h
102–103

pc_small_core?

103

pc_pcid_invlpg_bug would be more descriptive

sys/amd64/include/pmap.h
520 ↗(On Diff #114371)

"..., which results in the INVLPG instruction failing to flush all

Do you have any guess as to how long we might be stuck with this workaround?

At this point we have no idea. I'd like to get some workaround in the tree as soon as possible; if it looks like it's going to be permanent we can look at optimizations.

sys/amd64/amd64/pmap.c
535 ↗(On Diff #114371)

Should have a description, something like Use INVPCID_CTXGLOB to work around small core INVLPG microarchitectural bug or Enable small core INVLPG workaround perhaps?

I wonder about the name too, maybe pcid_invlpg_workaround instead, or something like force_invpcid? pcid_invlpg_bug might give the impression that this is just reporting whether the bug is present, I think.

@markj, thoughts?

kib marked 5 inline comments as done.

Avoid multiple global invalidations for range ops.
Misc. renames and comments improvements.

sys/x86/include/specialreg.h
494

What is this line? I'm not sure what "lead" means here, and there is a closing comment delimiter without an opening delimiter. Fixing this macro and using in initializecpu() seems like a good idea though.

kib marked an inline comment as done.Dec 21 2022, 7:13 PM
kib added inline comments.
sys/x86/include/specialreg.h
494

It is 'leaf'. And '*/' is ignored because the macro is not used.

kib marked an inline comment as done.

Fix silly thinkos in specialreg.h

Convert one unneeded pmap_invlpg() into plain invlpg().

I assume that you did not modify pmap_pti_pcid_invlpg and pmap_pti_pcid_invlrng in support.S because the broken do not require PTI. Is that correct?

sys/amd64/amd64/initcpu.c
329

This should be updated to match the sysctl name. More broadly, I would suggest updating the name of the local variable, use_invlpg_pmap, to be pcid_invlpg_workaround, and the same for the PCPU field's name.

sys/amd64/amd64/mp_machdep.c
889 ↗(On Diff #114391)

I'm confused by this change. Specifically, this function is used when invpcid_works is false. Is that ever going to be the case on the broken cores? Moreover, if the change is required here, then I would also expect to see a similar change in invlrng_pcid_handler().

sys/amd64/amd64/pmap.c
3779 ↗(On Diff #114391)

Because the virtual address being mapped here was dynamically allocated, I don't think that we should make the assumption that a stale TLB entry for vaddr didn't have PG_G set. So, I would use pmap_invlpg() here.

10427–10428 ↗(On Diff #114391)

On an unrelated note, I don't understand why the invlpg is performed in pmap_quick_remove_page() instead of here. If the processor prefetches neighboring TLB entries to the one being accessed (as some have been reported to do), then the spin lock does not prevent the situation described in the "AMD64 Architecture Programmer's Manual Volume 2: System Programming" rev. 3.23, "7.3.1 Special Coherency Considerations".

10441–10442 ↗(On Diff #114391)

Change "there" to "here". Also, the correctness of a simple invlpg depends on this virtual address being exclusively mapped by pmap_quick_enter_page(). If someplace else had created a mapping at qframe, then that other place might have included PG_G in the mapping, and we would have to uise pmap_invlpg() here:

"Since qframe is exclusively mapped by pmap_quick_enter_page() and that function doesn't set PG_G, we can use INVLPG here.

kib marked 5 inline comments as done.Dec 29 2022, 8:53 PM
In D37770#861155, @alc wrote:

I assume that you did not modify pmap_pti_pcid_invlpg and pmap_pti_pcid_invlrng in support.S because the broken do not require PTI. Is that correct?

No, rather because this is code that assumes that invlpcid_works == 0, and as I said in another comment, I did not fully converted this case.

sys/amd64/amd64/mp_machdep.c
889 ↗(On Diff #114391)

This is a partially (minimally) done attempt to keep code working even with invpcid_works manually set to zero. I reverted this change for now.

sys/amd64/amd64/pmap.c
10427–10428 ↗(On Diff #114391)

The change is added to the branch as a separate commit.

kib marked 2 inline comments as done.

Handle Alan comments.
Rename the local var and pcpu member.
Remove partially done support for invlpcid_works == 0.
qenter() fix.

This revision is now accepted and ready to land.Dec 29 2022, 10:21 PM

Problem is 231d75568f16 and cde70e312c3f add the requirement to #include <sys/systm.h> and #include <sys/pcpu.h> before a #include <vm/vm.h>. Otherwise you end up with implicit declarations of PCPU_GET() and invlpg(), causing build failures.

Previously:

#include <sys/param.h>
#include <sys/kernel.h>

#include <vm/vm.h>
#include <vm/vm_page.h>

Was acceptable on x86, but now fails. This continues to work on other architectures. Could be nothing else has exploded since it is rare to include those two without sys/pcpu.h. The file in question is presently being tested prior to ending up on main.

Problem is 231d75568f16 and cde70e312c3f add the requirement to #include <sys/systm.h> and #include <sys/pcpu.h> before a #include <vm/vm.h>. Otherwise you end up with implicit declarations of PCPU_GET() and invlpg(), causing build failures.

Previously:

#include <sys/param.h>
#include <sys/kernel.h>

#include <vm/vm.h>
#include <vm/vm_page.h>

Was acceptable on x86, but now fails. This continues to work on other architectures. Could be nothing else has exploded since it is rare to include those two without sys/pcpu.h. The file in question is presently being tested prior to ending up on main.

Are you complaining about a build failure in not-yet-committed code? It took non-trivial efforts for me to understand this.

I will commit the following when some other change on the same branch is ready.

diff --git a/sys/amd64/include/pmap.h b/sys/amd64/include/pmap.h
index e7497c2f8b4b..a7e456d80fcf 100644
--- a/sys/amd64/include/pmap.h
+++ b/sys/amd64/include/pmap.h
@@ -516,6 +516,7 @@ pmap_invalidate_cpu_mask(pmap_t pmap)
   return (&pmap->pm_active);
 }
 
+#if defined(_SYS_PCPU_H_) && defined(_MACHINE_CPUFUNC_H_)
 /*
  * It seems that AlderLake+ small cores have some microarchitectural
  * bug, which results in the INVLPG instruction failing to flush all
@@ -533,6 +534,7 @@ pmap_invlpg(pmap_t pmap, vm_offset_t va)
      invlpg(va);
   }
 }
+#endif /* sys/pcpu.h && machine/cpufunc.h */
 
 #endif /* _KERNEL */
In D37770#862116, @kib wrote:

Are you complaining about a build failure in not-yet-committed code? It took non-trivial efforts for me to understand this.

Sorry about the poor message, but reads like you've got it. Specifically D28982 runs into this. I'm unsure of time-frame for when it is likely to get in.