Page MenuHomeFreeBSD

Implement early microcode loading for Intel i386 and amd64 platforms.
ClosedPublic

Authored by markj on Jul 20 2018, 8:45 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jan 6, 7:16 PM
Unknown Object (File)
Sun, Jan 5, 8:52 AM
Unknown Object (File)
Sun, Jan 5, 8:52 AM
Unknown Object (File)
Sun, Jan 5, 8:52 AM
Unknown Object (File)
Sun, Jan 5, 8:52 AM
Unknown Object (File)
Sun, Jan 5, 8:51 AM
Unknown Object (File)
Sun, Jan 5, 8:51 AM
Unknown Object (File)
Sun, Jan 5, 3:54 AM

Details

Summary

The feature allows updates to be loaded as one of the first steps of CPU
initialization, in particular before CPU feature detection is performed
on the BSP. With this change we also automatically reload a selected
microcode update on CPUs following an ACPI resume.

The idea is to preload the update using standard loader(8)
functionality. The update file may consist of a single update (Intel
updates are per-<family,model,stepping> tuple), or of multiple updates
concatenated together. In the latter case, we use the algorithms
described in section 9.11 of the SDM to select the correct update for
the host CPU. Selection is performed on the BSP, followed by
application of the update. Once APs are started, ucode_load_ap()
references the update selected by the BSP. Somewhat later during boot,
ucode_release() allocates some kernel memory, copies the selected update
there, and frees memory backing the preloaded update file. This ensures
that we can distribute and load all microcode updates in a single file
without wasting much RAM.

Upon resume from ACPI suspend, each CPU invokes ucode_reload() to
re-apply the selected update, if any.

I did not implement AMD support in this revision since AMD updates are
not released nearly as frequently and are typically released by BIOS
vendors, and the AMD microcode update facilities are mostly
undocumented. In particular, none of my AMD laptops can be updated
using the microcode updates provided with devcpu-data, and I have been
unable to find testers.

Test Plan

I did some testing on a Haswell Xeon, and on a laptop with working
suspend/resume. I tested both i386 and amd64 on the laptop, but
do not own any 32-bit Intel CPUs. I plan to send a CFT.

To test the patch, we only need to tell the loader to load a file of type
"cpu_microcode". For instance, in loader.conf:

ucode_load="YES"
ucode_name="/boot/firmware/06-3f-02.6f"
ucode_type="cpu_microcode"

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj added reviewers: emaste, jhb, sbruno, kib, imp.

Don't log an error when no update is preloaded.

sys/i386/i386/machdep.c
2344 ↗(On Diff #45663)

This isn't early enough on i386 - we call identify_cpu() from locore there.

sys/x86/x86/ucode.c
253 ↗(On Diff #45663)

I will add a chicken switch for this.

  • Add a tunable to disable reclamation of memory and KVA used for preloaded microcode updates.
  • Call identify_cpu() a second time after loading microcode on the BSP on i386. Make the function callable from C.
  • Allocate physical memory for the selected microcode update and copy the update there before loading it. This means that we don't need to perform a copy in ucode_release(), and that we don't have to rely the loader to provide 16-byte alignment of the update file.

Suppose that we have some microcode blob loaded at boot time, and then a newer update is loaded with cpucontrol(8). Which blob would be tried on resume ?

sys/x86/acpica/acpi_wakeup.c
328 ↗(On Diff #45926)

Did you considered applyin the ucode update in wakeup code. E.g. before we ever enter the long mode.

sys/x86/x86/ucode.c
84 ↗(On Diff #45926)

I suggest to extract the hardware format definitions into a reusable header.

288 ↗(On Diff #45926)

Why do you exclude the ht siblings ? I believe it is at least innocent to apply the update on each pair of threads. I suspect that it might be more useful to not exclude it, for some future bugs.

Suppose that the initial update at the boot time failed. My reading of your code is that on resume, the same update would be tried. If this is true, wouldn't it be better to not retry the failing update on the resume ?

markj marked an inline comment as done.Aug 2 2018, 3:00 PM
In D16370#351629, @kib wrote:

Suppose that we have some microcode blob loaded at boot time, and then a newer update is loaded with cpucontrol(8). Which blob would be tried on resume ?

We'd re-apply the old blob. I'm not sure if this is a bug or not - if no blob is loaded at boot time, then upon resume the new update will be lost anyway.

sys/x86/acpica/acpi_wakeup.c
328 ↗(On Diff #45926)

I couldn't convince myself it was necessary, and it would require special code in ucode.c. The initial update is applied on the BSP only after we've entered long mode.

sys/x86/x86/ucode.c
84 ↗(On Diff #45926)

Will do.

288 ↗(On Diff #45926)

In section 9.11.6.3 of the Intel SDM it is stated that during MP initialization, updates to the same physical core must be serialized. Of course, we could use a mutex instead of the cpu_hyperthread check; I don't have a preference here.

Yes, I think you're right. We should check the return value of ucode_intel_load() and set ucode_data conditionally.

We'd re-apply the old blob. I'm not sure if this is a bug or not - if no blob is loaded at boot time, then upon resume the new update will be lost anyway.

I think with this change we'd consider the userland cpucontrol method deprecated.

sys/x86/x86/ucode.c
288 ↗(On Diff #45926)

Perhaps @bwidawsk or @scott.d.phillips_intel.com can comment on which approach is preferable.

We'd re-apply the old blob. I'm not sure if this is a bug or not - if no blob is loaded at boot time, then upon resume the new update will be lost anyway.

I think with this change we'd consider the userland cpucontrol method deprecated.

I disagree. There is a value in both approaches. cpucontrol is useful to check a new blob, also it is useful when you do not want to apply specific microcode patch always, but need it sometime. E.g. HSW updates removed TRM. Or you might want to apply some costly workaround right now, but not persistently.

My point is that after cpucontrol loaded new blob, user needs to re-evaluate cpu_features, and this should persist. So that update better be honored by the suspend/resume code, otherwise what is the point of the feature ? It seems to be quite simple, cpucontrol update should replace the ucode_data.

In D16370#351703, @kib wrote:

We'd re-apply the old blob. I'm not sure if this is a bug or not - if no blob is loaded at boot time, then upon resume the new update will be lost anyway.

I think with this change we'd consider the userland cpucontrol method deprecated.

I disagree. There is a value in both approaches. cpucontrol is useful to check a new blob, also it is useful when you do not want to apply specific microcode patch always, but need it sometime. E.g. HSW updates removed TRM. Or you might want to apply some costly workaround right now, but not persistently.

My point is that after cpucontrol loaded new blob, user needs to re-evaluate cpu_features, and this should persist. So that update better be honored by the suspend/resume code, otherwise what is the point of the feature ? It seems to be quite simple, cpucontrol update should replace the ucode_data.

Indeed, it's pretty straightforward. I will just add a function to replace ucode_data.

I think with this change we'd consider the userland cpucontrol method deprecated.

No, having both methods still makes a lot of sense. Earlyboot ucode is necessary to unbreak CPUs enough to boot. But cpucontrol allows ucode updates at runtime. If a CPU errata fix or security mitigation can be applied at runtime instead of with a reboot, everyone is happier.

Additionally, the two sources have potentially different distribution channels with different latency. Admins may be able to fetch new ucode directly from a vendor faster than we can cut a new release or EN. This doesn't help them if the new ucode is needed to fix boot, but the vast majority of ucode patches seem to be ok to apply at runtime (we've gotten away with only runtime patching for this long...).

In D16370#351721, @cem wrote:

I think with this change we'd consider the userland cpucontrol method deprecated.

No, having both methods still makes a lot of sense. Earlyboot ucode is necessary to unbreak CPUs enough to boot. But cpucontrol allows ucode updates at runtime. If a CPU errata fix or security mitigation can be applied at runtime instead of with a reboot, everyone is happier.

Well, runtime loading still has disadvantages. It is better than nothing for mission-critical servers that cannot tolerate downtime.

Additionally, the two sources have potentially different distribution channels with different latency.

How's that? The devcpu-data port supplies updates in a format that can be consumed by either method.

In D16370#351721, @cem wrote:

Well, runtime loading still has disadvantages. It is better than nothing for mission-critical servers that cannot tolerate downtime.

Sure.

Additionally, the two sources have potentially different distribution channels with different latency.

How's that? The devcpu-data port supplies updates in a format that can be consumed by either method.

Businesses may have early NDA access to microcode that the port does not have access to or cannot distribute.

Businesses may have early NDA access to microcode that the port does not have access to or cannot distribute.

But they could apply that microcode with either method.

Perhaps deprecated was the wrong word and I should say ucode-from-loader is the preferred method.

Businesses may have early NDA access to microcode that the port does not have access to or cannot distribute.

But they could apply that microcode with either method.

Perhaps deprecated was the wrong word and I should say ucode-from-loader is the preferred method.

I think also "limited". To support cpu_features changing at runtime, various other parts of the kernel also have to start using dynamic checks. Most users of cpu_features do so during boot and then don't check again. The current spectre stuff only does so explicitly. Loading the blob from the loader (which is not persistent since you can just remove the package or edit loader.conf to disable it) is more future proof in that it works with places that only check feature flags during boot.

Businesses may have early NDA access to microcode that the port does not have access to or cannot distribute.

But they could apply that microcode with either method.

Don't they have to somehow process it into something loader can load first?

Either way, seems like there are good caveats to add to cpucontrol.8 if they are not present already.

markj marked 3 inline comments as done.

Updates based on feedback:

  • Add a mechanism for cpuctl(4) to replace the existing update.
  • Move ucode header struct definitions to ucode.h.
  • Only set ucode_data during boot if we successfully loaded microcode on the BSP.
In D16370#351801, @cem wrote:

Sorry, I missed this.

Businesses may have early NDA access to microcode that the port does not have access to or cannot distribute.

But they could apply that microcode with either method.

Don't they have to somehow process it into something loader can load first?

Not as far as I know. The updates are distributed as blobs following a format defined in the SDM, and both cpucontrol(8) and the kernel can load them.

Either way, seems like there are good caveats to add to cpucontrol.8 if they are not present already.

Yeah. I'm planning to do some man page updates as a separate change.

Were there any other comments or concerns regarding this change? There is still some time left to make substantial changes if needed, but not very much.

This revision is now accepted and ready to land.Aug 10 2018, 3:07 PM
This revision was automatically updated to reflect the committed changes.
head/sys/x86/x86/ucode.c
344

This will break once memcpy() is an ifunc.

head/sys/x86/x86/ucode.c
344

So it would become an explicit byte copy loop, perhaps with volatile qualifiers. link_elf_ireloc() already did that for bzero().