Page MenuHomeFreeBSD

x86: x86_msr_op(): Add comments, guarantee atomicity
ClosedPublic

Authored by olce on Sat, Jan 31, 11:30 AM.
Tags
None
Referenced Files
F144252108: D54996.diff
Sat, Feb 7, 3:51 AM
Unknown Object (File)
Tue, Feb 3, 10:57 PM
Unknown Object (File)
Tue, Feb 3, 10:57 PM
Unknown Object (File)
Tue, Feb 3, 10:57 PM
Unknown Object (File)
Tue, Feb 3, 10:02 PM
Unknown Object (File)
Tue, Feb 3, 6:42 AM
Unknown Object (File)
Mon, Feb 2, 6:05 PM
Unknown Object (File)
Mon, Feb 2, 8:38 AM
Subscribers

Details

Summary

In 3 small commits, grouped here:

  • x86: x86_msr_op(): Move setting mode up, delineate logical blocks

No functional changes (intended).

  • x86: x86_msr_op(): Simplify assertions

Simplify them by moving them into more natural places, i.e., default
cases of 'switch' statements.

No functional change (intended).

  • x86: x86_msr_op(): MSR_OP_LOCAL: Disable interrupts on atomic ops

On MSR_OP_LOCAL and non-naturally-atomic operations (MSR_OP_ANDNOT and
MSR_OP_OR), there is no guarantee that we are not interrupted between
reading and writing the MSR, and that interruption could actually
perform some operation on that MSR, which would be lost.

Prevent that problem by temporarily disabling interrupts around MSR
manipulation.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 70268
Build 67151: arc lint + arc unit

Event Timeline

olce requested review of this revision.Sat, Jan 31, 11:30 AM
olce retitled this revision from x86: x86_msr_op(): Move setting mode up, delineate logical blocks to x86: x86_msr_op(): Add comments, cosmetic changes.Sat, Jan 31, 11:32 AM
olce edited the summary of this revision. (Show Details)
sys/x86/include/x86_var.h
177

Or the move or interruption can be tolerated.

I suggest to say what you want differently:

All modes, except MSR_OP_LOCAL, are executed on the target CPU with interrupts disabled.

This gives all information without being too verbose.

179

Technically pinned has very specific meaning, we are not executing pinned.

sys/x86/x86/cpu_machdep.c
172

You already said this in the herald comment, I do not see much use to repeat it twice.

sys/x86/x86/cpu_machdep.c
175

Isn't this an argument for disabling interrupts in the OP_LOCAL case?

sys/x86/x86/cpu_machdep.c
175

In x86_msr_op, you mean? No, I do not think it is needed. Usages of the _LOCAL op are confined to hammertime() or nearby stages of boot, where we already guaranteed to be single-CPU with disabled interrupts.

sys/x86/x86/cpu_machdep.c
175

Usages of the _LOCAL op are confined to hammertime() or nearby stages of boot

But x86_msr_op() is supposed to be a general-purpose interface, so the current behaviour seems like a footgun.

If it's important for avoid redundant intr_disable() calls, then some MSR_OP_LOCAL_INTR_DISABLED variant could be introduced and used from hammer_time() etc..

sys/x86/x86/cpu_machdep.c
175

There is no use, at least for now, for MSR_OP_LOCAL which would explicitly disable interrupts on the current CPU. More, I think that there would be no use for it at all, since how would the caller know on which CPU it is executing?

If you insist, we can rename MSR_OP_LOCAL to e.g. MSR_OP_HAMMERTIME.

sys/x86/x86/cpu_machdep.c
175

how would the caller know on which CPU it is executing?

Maybe it called sched_bind() instead of disabling local interrupts. hwpstate_intel() does this, e.g., in sysctl_epp_select().

If you insist, we can rename MSR_OP_LOCAL to e.g. MSR_OP_HAMMERTIME.

I don't insist on anything in particular, but it seems it'd be simplest to assert that interrupts are disabled in the MSR_OP_LOCAL case.

sys/x86/x86/cpu_machdep.c
175

how would the caller know on which CPU it is executing?

Maybe it called sched_bind() instead of disabling local interrupts. hwpstate_intel() does this, e.g., in sysctl_epp_select().

But hwpstate_intel does not use msr_op at all.

If you insist, we can rename MSR_OP_LOCAL to e.g. MSR_OP_HAMMERTIME.

I don't insist on anything in particular, but it seems it'd be simplest to assert that interrupts are disabled in the MSR_OP_LOCAL case.

Again, if bind to cpu was done, and the op is atomic, why disabling the interrupts.
Also, apparently I added MSR_OP_SCHED_ONE, which should be used if an action on specific CPU needs to be made. I think renaming the LOCAL to HAMMERTIME is the best way to close this.

sys/x86/x86/cpu_machdep.c
175

But hwpstate_intel does not use msr_op at all.

No, but it could.

Again, if bind to cpu was done, and the op is atomic, why disabling the interrupts.

If the operation is atomic, ok, but e.g., MSR_OP_OR is not atomic. It's a read-modify-write. So without interrupts disabled, another thread could be scheduled between the rdmsr() and wrmsr(). That's what Olivier's comment is explaining.

I think it makes the most sense for this interface to assert that interrupts are disabled.

  • Disable interrupts on MSR_OP_LOCAL and non-atomic MSR manipulation.
  • Trim and update comments.
sys/x86/include/x86_var.h
177

Or the move or interruption can be tolerated.

I don't see any scenario where this would make sense. The existence of such a possibility is almost certainly a bug (and if some SMM handler does it, all bets are off).

I suggest to say what you want differently:

All modes, except MSR_OP_LOCAL, are executed on the target CPU with interrupts disabled.

This gives all information without being too verbose.

Yes, reworked (but I prefer to be clear on the requirements for MSR_OP_LOCAL in the comments, even if adding an assertion).

179

In FreeBSD parlance, yes. (In general, technically, we are effectively "pinned" in the sense that we run on a CPU and just cannot be moved.)

sys/x86/x86/cpu_machdep.c
175

Maybe it called sched_bind() instead of disabling local interrupts. hwpstate_intel() does this, e.g., in sysctl_epp_select().

That indeed could happen. Though, for this case, I'll probably remove any such occurrences from hwpstate_intel at some point. And I have a revision in Phabricator being reviewed that removes the call to x86_msr_op() from hwpstate_amd entirely.

I've found a more compelling case: cpuctl_do_eval_cpu_features(). Concurrent ioctl() on dev/cpuctlX could cause problems (coupled with concurrent changes in machdep sysctl knobs about mitigations), however unlikely.

I don't insist on anything in particular, but it seems it'd be simplest to assert that interrupts are disabled in the MSR_OP_LOCAL case.

In the just-mentioned case, interrupts are not even disabled.

175

But hwpstate_intel does not use msr_op at all.

True, but as markj@ said that's an interface that "looks" general. That's why I took some time to review it and added comments (actually, I started some hack to internally convert MSR_OP_SCHED_ONE or MSR_OP_RENDEZVOUS_ONE to the current CPU to just MSR_LOCAL, and that's where I realized the requirements; I thought fixing it with spinlock_enter() but that cannot work as the scheduler is not up at that point on boot; but a simple intr_disable() works).

Again, if bind to cpu was done, and the op is atomic, why disabling the interrupts.

MSR_OP_ANDNOT and MSR_OP_OR (read-modify ones) are not atomic if interrupts are not disabled.

Also, apparently I added MSR_OP_SCHED_ONE, which should be used if an action on specific CPU needs to be made. I think renaming the LOCAL to HAMMERTIME is the best way to close this.

That does not close it since I've shown with the case above that we already have potential concurrency problems.

sys/x86/include/x86_var.h
175

This is not true. For the SCHED modes, we only do bind, For RENDEZVOUS it is true.

sys/x86/include/x86_var.h
175

It is true, because we hold the thread lock, which is a spinlock, in-between sched_bind() and sched_unbind().

olce retitled this revision from x86: x86_msr_op(): Add comments, cosmetic changes to x86: x86_msr_op(): Add comments, guarantee atomicity.Sun, Feb 1, 10:16 PM
olce edited the summary of this revision. (Show Details)
sys/x86/include/x86_var.h
175

Oh, yes, I coded it with the micro-optimization, so I am guilty there.
In fact, I think that this should be changed, or at least documented as if the thread lock is dropped around the msr access.

I plan to add the SCHED_SAFE variant of the mode, after your patch is committed. There, it is absolutely required to not hold thread lock around msr_safe() accesses. This was the purpose of the SCHED variant, to allow execution in non-critical context.

sys/x86/include/x86_var.h
175

I coded the SAFE option for msr_op, see D55045, for illustration of this point. I am not asking for review ATM, I will do after this revision lands, and I rebase.

Gentle ping.

Is it OK like that? Or would you prefer, e.g., always disabling interrupts in the MSR_OP_LOCAL case?

Gentle ping.

Is it OK like that? Or would you prefer, e.g., always disabling interrupts in the MSR_OP_LOCAL case?

I am fine with either case, IMO it is simpler to disable the interrupts unconditionally. It is boot code only anyway, and we should target the code size and not code speed.

But did you noted my yesterday comment about SCHED for the herald comment?

olce edited the summary of this revision. (Show Details)

Unconditionally deactivate interrupts on MSR_OP_LOCAL.

In D54996#1258511, @kib wrote:

I am fine with either case, IMO it is simpler to disable the interrupts unconditionally. It is boot code only anyway, and we should target the code size and not code speed.

Done.

But did you noted my yesterday comment about SCHED for the herald comment?

Yes. It was more of my intention to leave that discussion for when reviewing your patch instead of amending the code comments not to match what the code currently does, but there seems to still be some divergence here.

I think our goal should be to make these MSR accesses atomic, and change the rest to make it happen rather than the other way around. The SAFE variant should not be treated differently.

The only difference for SAFE is using *msr_safe(), which can fault with #GP(0) -> T_PROTFLT with handlers "installed" through pcb_onfault. After amd64's trap.c code re-examination, I fail to see any reason why this wouldn't work with interrupts disabled or even while holding a spinlock. So, where does "There, it is absolutely required to not hold thread lock around msr_safe() accesses" come from?

In D54996#1258511, @kib wrote:

I am fine with either case, IMO it is simpler to disable the interrupts unconditionally. It is boot code only anyway, and we should target the code size and not code speed.

Done.

But did you noted my yesterday comment about SCHED for the herald comment?

Yes. It was more of my intention to leave that discussion for when reviewing your patch instead of amending the code comments not to match what the code currently does, but there seems to still be some divergence here.

I think our goal should be to make these MSR accesses atomic, and change the rest to make it happen rather than the other way around. The SAFE variant should not be treated differently.

The only difference for SAFE is using *msr_safe(), which can fault with #GP(0) -> T_PROTFLT with handlers "installed" through pcb_onfault. After amd64's trap.c code re-examination, I fail to see any reason why this wouldn't work with interrupts disabled or even while holding a spinlock. So, where does "There, it is absolutely required to not hold thread lock around msr_safe() accesses" come from?

SAFE variants cannot be called while a spinlock is owned, otherwise sched_bind() cannot work, it needs to migrate thread to different CPU. Then, if trap() finds that interrupts were disabled and we do not own a spinlock, it re-enables interrupts e.g. for #GP. But this is only an example, the trap handling code really 'does not want' to be called (by hw) when interrupts are disabled. There are probably more fine details that can be dig out, but the intent always was that we should avoid faulting while interrupts are disabled.

And yes, SAFE variants are added specifically to be treated differently from the RENDEZVOUS case. If you want/need atomicity, the RENDEZVOUS is the right choice, if atomicity is not needed, which is not rare case, SAFE is good.

kib accepted this revision.EditedTue, Feb 3, 12:06 PM

I think it would be simpler to go forwardif you commit this version now, then I will provide the proper herald comment explaining the modes, in mine SAFE patch.

This revision is now accepted and ready to land.Tue, Feb 3, 12:06 PM
In D54996#1258764, @kib wrote:

SAFE variants cannot be called while a spinlock is owned, otherwise sched_bind() cannot work, it needs to migrate thread to different CPU.

I'm talking about holding a spinlock around calls to *msr_safe(), specifically here the thread lock.

Then, if trap() finds that interrupts were disabled and we do not own a spinlock, it re-enables interrupts e.g. for #GP.

No, for #GP (and some others) from kernel mode, it typically does not, that's my point. And I really don't see why it should, since all we are doing in this case is either invoke a specific handler (which has all leeway to re-enable them, or just return and let the original routine's callers handle that) or end up in trap_fatal().

But this is only an example, the trap handling code really 'does not want' to be called (by hw) when interrupts are disabled. There are probably more fine details that can be dig out, but the intent always was that we should avoid faulting while interrupts are disabled.

That's fine as a general intent (and even more generally, we shouldn't be executing anything with interrupts disabled unless there's a good reason too), but the exceptions we have here look fine to me, and in the case of #GP completely makes sense. I guess that's not your point of view, but then why?

And yes, SAFE variants are added specifically to be treated differently from the RENDEZVOUS case. If you want/need atomicity, the RENDEZVOUS is the right choice, if atomicity is not needed, which is not rare case, SAFE is good.

In D54996#1258764, @kib wrote:

And yes, SAFE variants are added specifically to be treated differently from the RENDEZVOUS case. If you want/need atomicity, the RENDEZVOUS is the right choice, if atomicity is not needed, which is not rare case, SAFE is good.

Typically, if I were to use the SAFE variant in hwpstate_amd instead of some ad-hoc handlers we have now, I'd want it to be atomic. So why have a non-atomic SAFE variant, where we can easily have an atomic one if I'm not mistaken?

FreeBSD is realtime kernel, where regions with disabled interrupts are exceptional. Trap happening with interrupts disabled means that CPU meeting some conditions that should not be there.
It might be that trap.c and exceptions.S somewhat evolved, but the guiding rule there was a simplification. As a secondary effect, it might take exceptions with interrupts disabled, in some cases.
It still assumes that e.g. GSBASE is valid, which is generally not true for regions with disabled interrupts. So I prefer to keep it close to original design.

That said, I added critical section around ops for SCHED method in the updated patch for x86_msr_op(). This should be more than enough for your uses in the power mgmt driver.