Paths

Table of Contentst

x86: x86_msr_op(): Add comments, guarantee atomicity
ClosedPublic
Actions

Authored by olce on Jan 31 2026, 11:30 AM.

Details

Reviewers

markj
kib

Commits

rG9f48e72c350c: x86: x86_msr_op(): MSR_OP_LOCAL: Disable interrupts on atomic ops
rG314aeea9f688: x86: x86_msr_op(): Simplify assertions
rG30f803f2898b: x86: x86_msr_op(): Move setting mode up, delineate logical blocks
rGe5f8cbb86d58: x86: x86_msr_op(): MSR_OP_LOCAL: Disable interrupts on atomic ops
rGa1a8bcdfde2e: x86: x86_msr_op(): Simplify assertions
rG7acd7ac6385c: x86: x86_msr_op(): Move setting mode up, delineate logical blocks

Summary

In 3 small commits, grouped here:

x86: x86_msr_op(): Move setting mode up, delineate logical blocks

No functional changes (intended).

x86: x86_msr_op(): Simplify assertions

Simplify them by moving them into more natural places, i.e., default
cases of 'switch' statements.

No functional change (intended).

x86: x86_msr_op(): MSR_OP_LOCAL: Disable interrupts on atomic ops

On MSR_OP_LOCAL and non-naturally-atomic operations (MSR_OP_ANDNOT and
MSR_OP_OR), there is no guarantee that we are not interrupted between
reading and writing the MSR, and that interruption could actually
perform some operation on that MSR, which would be lost.

Prevent that problem by temporarily disabling interrupts around MSR
manipulation.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 70268
Build 67151: arc lint + arc unit

Event Timeline

olce created this revision.Jan 31 2026, 11:30 AM

Herald added a subscriber: imp. · View Herald TranscriptJan 31 2026, 11:30 AM

olce requested review of this revision.Jan 31 2026, 11:30 AM

Harbormaster completed remote builds in B70268: Diff 170831.Jan 31 2026, 11:30 AM

olce retitled this revision from x86: x86_msr_op(): Move setting mode up, delineate logical blocks to x86: x86_msr_op(): Add comments, cosmetic changes.Jan 31 2026, 11:32 AM

olce edited the summary of this revision. (Show Details)

kib added inline comments.Jan 31 2026, 7:22 PM

sys/x86/include/x86_var.h
177	Or the move or interruption can be tolerated. I suggest to say what you want differently: All modes, except MSR_OP_LOCAL, are executed on the target CPU with interrupts disabled. This gives all information without being too verbose.
179	Technically pinned has very specific meaning, we are not executing pinned.
sys/x86/x86/cpu_machdep.c
172	You already said this in the herald comment, I do not see much use to repeat it twice.

markj added inline comments.Jan 31 2026, 7:28 PM

sys/x86/x86/cpu_machdep.c
175	Isn't this an argument for disabling interrupts in the OP_LOCAL case?

kib added inline comments.Jan 31 2026, 7:37 PM

sys/x86/x86/cpu_machdep.c
175	In x86_msr_op, you mean? No, I do not think it is needed. Usages of the _LOCAL op are confined to hammertime() or nearby stages of boot, where we already guaranteed to be single-CPU with disabled interrupts.

markj added inline comments.Sun, Feb 1, 1:37 AM

sys/x86/x86/cpu_machdep.c
175	Usages of the _LOCAL op are confined to hammertime() or nearby stages of boot But x86_msr_op() is supposed to be a general-purpose interface, so the current behaviour seems like a footgun. If it's important for avoid redundant intr_disable() calls, then some MSR_OP_LOCAL_INTR_DISABLED variant could be introduced and used from hammer_time() etc..

kib added inline comments.Sun, Feb 1, 2:03 AM

sys/x86/x86/cpu_machdep.c
175	There is no use, at least for now, for MSR_OP_LOCAL which would explicitly disable interrupts on the current CPU. More, I think that there would be no use for it at all, since how would the caller know on which CPU it is executing? If you insist, we can rename MSR_OP_LOCAL to e.g. MSR_OP_HAMMERTIME.

markj added inline comments.Sun, Feb 1, 2:18 AM

sys/x86/x86/cpu_machdep.c
175	how would the caller know on which CPU it is executing? Maybe it called sched_bind() instead of disabling local interrupts. hwpstate_intel() does this, e.g., in sysctl_epp_select(). If you insist, we can rename MSR_OP_LOCAL to e.g. MSR_OP_HAMMERTIME. I don't insist on anything in particular, but it seems it'd be simplest to assert that interrupts are disabled in the MSR_OP_LOCAL case.

kib added inline comments.Sun, Feb 1, 2:31 AM

sys/x86/x86/cpu_machdep.c
175	how would the caller know on which CPU it is executing? Maybe it called sched_bind() instead of disabling local interrupts. hwpstate_intel() does this, e.g., in sysctl_epp_select(). But hwpstate_intel does not use msr_op at all. If you insist, we can rename MSR_OP_LOCAL to e.g. MSR_OP_HAMMERTIME. I don't insist on anything in particular, but it seems it'd be simplest to assert that interrupts are disabled in the MSR_OP_LOCAL case. Again, if bind to cpu was done, and the op is atomic, why disabling the interrupts. Also, apparently I added MSR_OP_SCHED_ONE, which should be used if an action on specific CPU needs to be made. I think renaming the LOCAL to HAMMERTIME is the best way to close this.

markj added inline comments.Sun, Feb 1, 2:38 PM

sys/x86/x86/cpu_machdep.c
175	But hwpstate_intel does not use msr_op at all. No, but it could. Again, if bind to cpu was done, and the op is atomic, why disabling the interrupts. If the operation is atomic, ok, but e.g., MSR_OP_OR is not atomic. It's a read-modify-write. So without interrupts disabled, another thread could be scheduled between the rdmsr() and wrmsr(). That's what Olivier's comment is explaining. I think it makes the most sense for this interface to assert that interrupts are disabled.

Disable interrupts on MSR_OP_LOCAL and non-atomic MSR manipulation.
Trim and update comments.

Harbormaster completed remote builds in B70316: Diff 170939.Sun, Feb 1, 3:17 PM

olce added inline comments.Sun, Feb 1, 3:17 PM

sys/x86/include/x86_var.h
177	Or the move or interruption can be tolerated. I don't see any scenario where this would make sense. The existence of such a possibility is almost certainly a bug (and if some SMM handler does it, all bets are off). I suggest to say what you want differently: All modes, except MSR_OP_LOCAL, are executed on the target CPU with interrupts disabled. This gives all information without being too verbose. Yes, reworked (but I prefer to be clear on the requirements for `MSR_OP_LOCAL` in the comments, even if adding an assertion).
179	In FreeBSD parlance, yes. (In general, technically, we are effectively "pinned" in the sense that we run on a CPU and just cannot be moved.)
sys/x86/x86/cpu_machdep.c
175	Maybe it called sched_bind() instead of disabling local interrupts. hwpstate_intel() does this, e.g., in sysctl_epp_select(). That indeed could happen. Though, for this case, I'll probably remove any such occurrences from `hwpstate_intel` at some point. And I have a revision in Phabricator being reviewed that removes the call to `x86_msr_op()` from `hwpstate_amd` entirely. I've found a more compelling case: `cpuctl_do_eval_cpu_features()`. Concurrent `ioctl()` on `dev/cpuctlX` could cause problems (coupled with concurrent changes in `machdep` sysctl knobs about mitigations), however unlikely. I don't insist on anything in particular, but it seems it'd be simplest to assert that interrupts are disabled in the MSR_OP_LOCAL case. In the just-mentioned case, interrupts are not even disabled.
175	But hwpstate_intel does not use msr_op at all. True, but as markj@ said that's an interface that "looks" general. That's why I took some time to review it and added comments (actually, I started some hack to internally convert `MSR_OP_SCHED_ONE` or `MSR_OP_RENDEZVOUS_ONE` to the current CPU to just `MSR_LOCAL`, and that's where I realized the requirements; I thought fixing it with `spinlock_enter()` but that cannot work as the scheduler is not up at that point on boot; but a simple `intr_disable()` works). Again, if bind to cpu was done, and the op is atomic, why disabling the interrupts. `MSR_OP_ANDNOT` and `MSR_OP_OR` (read-modify ones) are not atomic if interrupts are not disabled. Also, apparently I added MSR_OP_SCHED_ONE, which should be used if an action on specific CPU needs to be made. I think renaming the LOCAL to HAMMERTIME is the best way to close this. That does not close it since I've shown with the case above that we already have potential concurrency problems.

kib added inline comments.Sun, Feb 1, 8:49 PM

sys/x86/include/x86_var.h
175	This is not true. For the SCHED modes, we only do bind, For RENDEZVOUS it is true.

olce added inline comments.Sun, Feb 1, 10:13 PM

sys/x86/include/x86_var.h
175	It is true, because we hold the thread lock, which is a spinlock, in-between `sched_bind()` and `sched_unbind()`.

olce retitled this revision from x86: x86_msr_op(): Add comments, cosmetic changes to x86: x86_msr_op(): Add comments, guarantee atomicity.Sun, Feb 1, 10:16 PM

olce edited the summary of this revision. (Show Details)

kib added inline comments.Sun, Feb 1, 10:50 PM

sys/x86/include/x86_var.h
175	Oh, yes, I coded it with the micro-optimization, so I am guilty there. In fact, I think that this should be changed, or at least documented as if the thread lock is dropped around the msr access. I plan to add the SCHED_SAFE variant of the mode, after your patch is committed. There, it is absolutely required to not hold thread lock around msr_safe() accesses. This was the purpose of the SCHED variant, to allow execution in non-critical context.

kib added inline comments.Mon, Feb 2, 12:03 AM

sys/x86/include/x86_var.h
175	I coded the SAFE option for msr_op, see D55045, for illustration of this point. I am not asking for review ATM, I will do after this revision lands, and I rebase.

Gentle ping.

Is it OK like that? Or would you prefer, e.g., always disabling interrupts in the MSR_OP_LOCAL case?

In D54996#1258510, @olce wrote:

Gentle ping.

Is it OK like that? Or would you prefer, e.g., always disabling interrupts in the MSR_OP_LOCAL case?

I am fine with either case, IMO it is simpler to disable the interrupts unconditionally. It is boot code only anyway, and we should target the code size and not code speed.

But did you noted my yesterday comment about SCHED for the herald comment?

Unconditionally deactivate interrupts on MSR_OP_LOCAL.

Harbormaster completed remote builds in B70367: Diff 171056.Tue, Feb 3, 10:55 AM

In D54996#1258511, @kib wrote:

I am fine with either case, IMO it is simpler to disable the interrupts unconditionally. It is boot code only anyway, and we should target the code size and not code speed.

Done.

But did you noted my yesterday comment about SCHED for the herald comment?

Yes. It was more of my intention to leave that discussion for when reviewing your patch instead of amending the code comments not to match what the code currently does, but there seems to still be some divergence here.

I think our goal should be to make these MSR accesses atomic, and change the rest to make it happen rather than the other way around. The SAFE variant should not be treated differently.

The only difference for SAFE is using *msr_safe(), which can fault with #GP(0) -> T_PROTFLT with handlers "installed" through pcb_onfault. After amd64's trap.c code re-examination, I fail to see any reason why this wouldn't work with interrupts disabled or even while holding a spinlock. So, where does "There, it is absolutely required to not hold thread lock around msr_safe() accesses" come from?

In D54996#1258747, @olce wrote:

In D54996#1258511, @kib wrote:

I am fine with either case, IMO it is simpler to disable the interrupts unconditionally. It is boot code only anyway, and we should target the code size and not code speed.

Done.

But did you noted my yesterday comment about SCHED for the herald comment?

Yes. It was more of my intention to leave that discussion for when reviewing your patch instead of amending the code comments not to match what the code currently does, but there seems to still be some divergence here.

I think our goal should be to make these MSR accesses atomic, and change the rest to make it happen rather than the other way around. The SAFE variant should not be treated differently.

The only difference for SAFE is using *msr_safe(), which can fault with #GP(0) -> T_PROTFLT with handlers "installed" through pcb_onfault. After amd64's trap.c code re-examination, I fail to see any reason why this wouldn't work with interrupts disabled or even while holding a spinlock. So, where does "There, it is absolutely required to not hold thread lock around msr_safe() accesses" come from?

SAFE variants cannot be called while a spinlock is owned, otherwise sched_bind() cannot work, it needs to migrate thread to different CPU. Then, if trap() finds that interrupts were disabled and we do not own a spinlock, it re-enables interrupts e.g. for #GP. But this is only an example, the trap handling code really 'does not want' to be called (by hw) when interrupts are disabled. There are probably more fine details that can be dig out, but the intent always was that we should avoid faulting while interrupts are disabled.

And yes, SAFE variants are added specifically to be treated differently from the RENDEZVOUS case. If you want/need atomicity, the RENDEZVOUS is the right choice, if atomicity is not needed, which is not rare case, SAFE is good.

I think it would be simpler to go forwardif you commit this version now, then I will provide the proper herald comment explaining the modes, in mine SAFE patch.

This revision is now accepted and ready to land.Tue, Feb 3, 12:06 PM

In D54996#1258764, @kib wrote:

SAFE variants cannot be called while a spinlock is owned, otherwise sched_bind() cannot work, it needs to migrate thread to different CPU.

I'm talking about holding a spinlock around calls to *msr_safe(), specifically here the thread lock.

Then, if trap() finds that interrupts were disabled and we do not own a spinlock, it re-enables interrupts e.g. for #GP.

No, for #GP (and some others) from kernel mode, it typically does not, that's my point. And I really don't see why it should, since all we are doing in this case is either invoke a specific handler (which has all leeway to re-enable them, or just return and let the original routine's callers handle that) or end up in trap_fatal().

But this is only an example, the trap handling code really 'does not want' to be called (by hw) when interrupts are disabled. There are probably more fine details that can be dig out, but the intent always was that we should avoid faulting while interrupts are disabled.

That's fine as a general intent (and even more generally, we shouldn't be executing anything with interrupts disabled unless there's a good reason too), but the exceptions we have here look fine to me, and in the case of #GP completely makes sense. I guess that's not your point of view, but then why?

And yes, SAFE variants are added specifically to be treated differently from the RENDEZVOUS case. If you want/need atomicity, the RENDEZVOUS is the right choice, if atomicity is not needed, which is not rare case, SAFE is good.

In D54996#1258764, @kib wrote:

And yes, SAFE variants are added specifically to be treated differently from the RENDEZVOUS case. If you want/need atomicity, the RENDEZVOUS is the right choice, if atomicity is not needed, which is not rare case, SAFE is good.

Typically, if I were to use the SAFE variant in hwpstate_amd instead of some ad-hoc handlers we have now, I'd want it to be atomic. So why have a non-atomic SAFE variant, where we can easily have an atomic one if I'm not mistaken?

Closed by commit rG7acd7ac6385c: x86: x86_msr_op(): Move setting mode up, delineate logical blocks (authored by olce). · Explain WhyTue, Feb 3, 2:05 PM

This revision was automatically updated to reflect the committed changes.

olce added a commit: rG7acd7ac6385c: x86: x86_msr_op(): Move setting mode up, delineate logical blocks.

olce added a commit: rGa1a8bcdfde2e: x86: x86_msr_op(): Simplify assertions.

olce added a commit: rGe5f8cbb86d58: x86: x86_msr_op(): MSR_OP_LOCAL: Disable interrupts on atomic ops.

FreeBSD is realtime kernel, where regions with disabled interrupts are exceptional. Trap happening with interrupts disabled means that CPU meeting some conditions that should not be there.
It might be that trap.c and exceptions.S somewhat evolved, but the guiding rule there was a simplification. As a secondary effect, it might take exceptions with interrupts disabled, in some cases.
It still assumes that e.g. GSBASE is valid, which is generally not true for regions with disabled interrupts. So I prefer to keep it close to original design.

That said, I added critical section around ops for SCHED method in the updated patch for x86_msr_op(). This should be more than enough for your uses in the power mgmt driver.

markj mentioned this in D55045: x86: add a safe variant of MSR_OP_SCHED* operations.Tue, Feb 10, 3:42 PM

olce added a commit: rG30f803f2898b: x86: x86_msr_op(): Move setting mode up, delineate logical blocks.Thu, Feb 19, 12:29 PM

olce added a commit: rG314aeea9f688: x86: x86_msr_op(): Simplify assertions.

olce added a commit: rG9f48e72c350c: x86: x86_msr_op(): MSR_OP_LOCAL: Disable interrupts on atomic ops.