The CPUCTL_UPDATE is supposed to be applied only to the CPU the ioctl(2)
was performed on. This is true for Intel CPUs, but for AMD the SMP
rendezvouz of amd_ucode_wrmsr() effectively executed it on all CPUs.
Also, the update failure was not reported.
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
@avg said that the old behaviour was intentional: https://reviews.freebsd.org/D15560#328553
I guess it's not a problem on sufficiently new CPUs? That said, why is it useful to update ucode on cores one by one?
I somewhat doubt that the arguments by @avg were true. If indeed the ucode update could affect the infinity fabric protocols, it is up to the ucode itself to handle it. If you do software IPI to update all cores, it is still not synced in the strict sense, the update only happen in much rapid succession.
Both Intel and AMD describe how to update ucode on single CPU (or might be on single CPU out of HT pair). Since this is how it is documented, I do not see a reason to not expose the per-cpu update ability to userspace.
sys/dev/cpuctl/cpuctl.c | ||
---|---|---|
441 | Broadcasting the update when you target a single CPU is excessive. |
Well, the activity of the cores is quite limited during the update when using smp_rendezvous.
Both Intel and AMD describe how to update ucode on single CPU (or might be on single CPU out of HT pair). Since this is how it is documented, I do not see a reason to not expose the per-cpu update ability to userspace.
Ok. I'm just noting that this might not work on old CPUs (back when AMD did not document the ucode update procedures IIRC). I have no objection.
sys/dev/cpuctl/cpuctl.c | ||
---|---|---|
441 | update_intel() does the same thing BTW, both should be fixed. |
Maybe I misdiagnosed the problem back then, on the hardware that I had.
But I certainly did some investigation which led me towards trying to do the ucode update as closely in time as possible.
The system in question was from 10h family (Phenom II, I think).
I still have it somewhere, so I may try to test this change on that system if there is any interest in such a test and I find time.
Andriy didn't say there that the old behavior was intentional, rather the opposite - that old behavior causes a problems when updating two SMT cores simultaneously. My patch should also address the problems Andriy had with that old CPU.
I think smp_rendezvous() is still needed when we update a single core. We want any SMT sibling of it to spin executing only in non-CISC instructions. We potentially can create a map of CPUs and rendezvous only the SMT siblings of CPU being updated, instead of all CPUs.
That's not how I read it. "I changed the update code to what it is now (synchronized update), because on my Phenom II based systems the old update mode, core after core, driven by userland, caused hard hangs."
In particular, serialized updates caused hard hangs.
I think smp_rendezvous() is still needed when we update a single core. We want any SMT sibling of it to spin executing only in non-CISC instructions. We potentially can create a map of CPUs and rendezvous only the SMT siblings of CPU being updated, instead of all CPUs.
I think some comment needs to explain this, ideally with some reference to AMD documentation. In https://docs.amd.com/v/u/en-US/57930-A0-PUB_3.00 I see the microcode file header has a bit, "load control" which determines whether sibling threads should all be updated. In the ucode.c loader we currently always skip SMT siblings, see the check in ucode_load_ap().
Anyway, I do not object to the change.
Ok, guys, can you please approve this change, then? The rationale is that most recent microcode that is published in Linux and already committed to our ports fails to load on some CPUs and the tool right now doesn't report that. Reports success instead. The fact that on AMD the tool updates all cores, unlike what documentation says and unlike what the same command does on Intel, is also a bug.