Page MenuHomeFreeBSD

Use a spin lock to serialize updates on AMD CPUs.
AbandonedPublic

Authored by markj on May 24 2018, 6:48 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 4 2024, 1:41 AM
Unknown Object (File)
Feb 27 2024, 7:08 AM
Unknown Object (File)
Feb 14 2024, 8:32 PM
Unknown Object (File)
Jan 13 2024, 8:41 AM
Unknown Object (File)
Dec 20 2023, 8:38 AM
Unknown Object (File)
Nov 22 2023, 9:53 AM
Unknown Object (File)
Nov 16 2023, 12:03 PM
Unknown Object (File)
Nov 7 2023, 11:46 AM
Subscribers
None

Details

Summary

The Zen microarchitecture supports SMT, and our AMD microcode update
code will cause hardware threads to attempt the update simultaneously.
The Intel SDM explicitly warns against doing this. AMD does not have any
public-facing documentation about microcode updates that I can find, but
Linux serializes updates with a lock, so do the same here.

Test Plan

Untested as yet, I'll give it a try with a 0x15 family system.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 17537
Build 17358: arc lint + arc unit

Event Timeline

markj edited the test plan for this revision. (Show Details)
markj added reviewers: emaste, avg, sbruno, cem.

I am curious if both (all) hardware threads within a core need to do the update or if it would be sufficient for just one thread per core to do it.

In D15560#328513, @avg wrote:

I am curious if both (all) hardware threads within a core need to do the update or if it would be sufficient for just one thread per core to do it.

For Intel, my understanding is that only one thread needs to perform the update. I'm not sure about AMD.

I think that it should be relatively easy to instruct threads to do either an update or just a spin.
Having said that, I won't object against this change if it fixes a problem that has been experienced.

In D15560#328515, @avg wrote:

I think that it should be relatively easy to instruct threads to do either an update or just a spin.
Having said that, I won't object against this change if it fixes a problem that has been experienced.

Are you suggesting something like the following instead?

if (CPU_ISSET(curcpu, &logical_cpus_mask))
    return;
wrmsr_safe(...);
...

The problem was found by code inspection. I'm working on code to permit early
loading of microcode and noticed the potential issue.

This had no effect on my AMD FX-8150, which is to be expected.

May 24 13:45:03 lab microcode_update[1315]: /usr/local/share/cpucontrol/microcode_amd_fam15h.bin: updating cpu /dev/cpuctl0 to revision 0x600063e... done.
May 24 13:45:04 lab kernel: CPU: AMD FX(tm)-8150 Eight-Core Processor            (3599.97-MHz K8-class CPU)
May 24 13:45:04 lab kernel:   Origin="AuthenticAMD"  Id=0x600f12  Family=0x15  Model=0x1  Stepping=2
May 24 13:45:04 lab kernel:   Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
May 24 13:45:04 lab kernel:   Features2=0x1e98220b<SSE3,PCLMULQDQ,MON,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX>
May 24 13:45:04 lab kernel:   AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
May 24 13:45:04 lab kernel:   AMD Features2=0x1c9bfff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,XOP,SKINIT,WDT,LWP,FMA4,NodeId,Topology,PCXC,PNXC>
May 24 13:45:04 lab kernel:   SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=65536
May 24 13:45:04 lab kernel:   TSC: P-state invariant, performance statistics
This revision is now accepted and ready to land.May 24 2018, 7:46 PM

@markj something like that, yes.
I am not sure if logical_cpus_mask is accurate on Ryzen, but I think that we collect enough topology information to either make it correct or to provide another way of checking whether a CPU is a "primary" thread of a core.

In D15560#328520, @avg wrote:

@markj something like that, yes.
I am not sure if logical_cpus_mask is accurate on Ryzen, but I think that we collect enough topology information to either make it correct or to provide another way of checking whether a CPU is a "primary" thread of a core.

The alternative would be to crib what Linux does and use an undocumented MSR (0x8b) to read the current ucode revision, and only update if the loaded ucode file contains a different revision.

We already check MSR in userland (in cpucontrol), but treat it as a global.
In any case, all three alternatives are fine by me.
Although, my (weak) preference is to use the topology information.

Here's what I've gotten out of the Epyc PPR:

  1. Threads on a core share microcode. Only one set of microcode-update MSRs per core.
  2. All cores need to be loaded via write of pointer to PATCH_LOADER (0xC001_0020)
  3. We should check PATCH_LEVEL (0x0000_008B) on all cores after loading, not just a single one, to verify the patch was applied successfully.
  4. Unclear if updates to independent cores need to be serialized. Seems harmless โ€” upper bound of 100M cycles = 50 milliseconds at 2GHz. Even on 32-core Epyc that's max 1.6 seconds to load all cores, with a low-clock part.
  5. The memory pointed to can't cause a page or segmentation fault, but obviously that shouldn't happen for any valid kernel memory anyway.

The low 32 bits of PATCH_LEVEL are the patch level number and the high 32 bits are reserved. Zero indicates no patch.

PATCH_LOADER is a *linear* (virtual) address; 64 bits when running in 64 bit mode. Otherwise, upper 32 are ignored.

The header format of microcode updates is documented so we can verify it applies to this model, northbridge, and current patch level before attempting it. There is even sample C code for this. I would really hope attempting to apply an invalid update is protected by the CPU itself, but this isn't specified by the document.

Hope that helps.

Patch seems harmless.

Thanks a lot!
About this:

Unclear if updates to independent cores need to be serialized. Seems harmless โ€” upper bound of 100M cycles = 50 milliseconds at 2GHz. Even on 32-core Epyc that's max 1.6 seconds to load all cores, with a low-clock part.
The memory pointed to can't cause a page or segmentation fault, but obviously that shouldn't happen for any valid kernel memory anyway.

I changed the update code to what it is now (synchronized update), because on my Phenom II based systems the old update mode, core after core, driven by userland, caused hard hangs.
My assumption at the time was that the microcode can affect the inter-core communication (or HyperTransport), so it was necessary to keep all core safely "parked" and to update them at the same time.
I didn't have access to any non-public AMD documentation, so I don't know what really happened and what we were supposed to do, but that change fixed my problem and didn't seem to introduce any new problems.

In D15560#328553, @avg wrote:

I changed the update code to what it is now (synchronized update), because on my Phenom II based systems the old update mode, core after core, driven by userland, caused hard hangs.
My assumption at the time was that the microcode can affect the inter-core communication (or HyperTransport), so it was necessary to keep all core safely "parked" and to update them at the same time.
I didn't have access to any non-public AMD documentation, so I don't know what really happened and what we were supposed to do, but that change fixed my problem and didn't seem to introduce any new problems.

I don't see anything to that effect in the 17h docs, although of course it's possible it is just undocumented.

bump. @markj Did you want to shovel this into head?

bump. @markj Did you want to shovel this into head?

If anyone with a Ryzen that has SMT enabled could print logical_cpus_mask (e.g., from kgdb) and paste it here, I'd appreciate it. I'd prefer to use that instead of a spin lock. Otherwise I will just go ahead and commit this in the next day or two, since it's simple and easily MFCed.

If anyone with a Ryzen that has SMT enabled could print logical_cpus_mask (e.g., from kgdb) and paste it here, I'd appreciate it. I'd prefer to use that instead of a spin lock. Otherwise I will just go ahead and commit this in the next day or two, since it's simple and easily MFCed.

CURRENT from r334596 ok? SMT is enabled:

(kgdb) p/x logical_cpus_mask
$2 = {

__bits = {0xaaaaaaaa, 0x0, 0x0, 0x0}

}

Only update microcode on physical CPUs.

This revision now requires review to proceed.Jun 22 2018, 10:27 AM

Apologies for the delay. Would anyone with a Ryzen or TR be able to test a microcode update with the updated patch applied?

Apologies for the delay. Would anyone with a Ryzen or TR be able to test a microcode update with the updated patch applied?

I'd be able to if AMD has published any microcode updates for Ryzen/TR. It seems the ones they distribute through non-BIOS channels are just for Epyc.

In D15560#338076, @cem wrote:

Apologies for the delay. Would anyone with a Ryzen or TR be able to test a microcode update with the updated patch applied?

I'd be able to if AMD has published any microcode updates for Ryzen/TR. It seems the ones they distribute through non-BIOS channels are just for Epyc.

Ah, right. I emailed someone with the ability to test on Epyc, asking them to test.

The change is based on a hypothetical problem and is currently impractical to test. Moreover, according to the Intel SDM (8.7.11), modern CPUs do not in fact require sibling hyperthreads to synchronize updates. With the lack of clear documentation around this from AMD, I'll just drop this for now.

With the lack of clear documentation around this from AMD, I'll just drop this for now.

For future readers, https://reviews.freebsd.org/D15560#328547 is what I got out of the Epyc doc.

I don't think AMD Fam17h threads can update microcode independently โ€” they share the the same PATCH_LOADER MSR with other SMT thread on the same core. I'm not sure what happens if concurrent writers attempt to write the same PATCH_LOADER MSR โ€” wrmsr itself is serializing to the local instruction stream, but I don't know that that means anything against another thread on the same core.

The patch as-is looks sensible on 17h โ€” don't know about earlier AMD hardware.