Page MenuHomeFreeBSD

x86: add a safe variant of MSR_OP_SCHED* operations
Needs ReviewPublic

Authored by kib on Mon, Feb 2, 12:01 AM.
Tags
None
Referenced Files
F144619914: D55045.diff
Tue, Feb 10, 8:22 AM
Unknown Object (File)
Mon, Feb 9, 10:34 PM
Unknown Object (File)
Tue, Feb 3, 10:48 PM
Unknown Object (File)
Tue, Feb 3, 2:37 AM
Unknown Object (File)
Mon, Feb 2, 10:55 PM
Unknown Object (File)
Mon, Feb 2, 4:46 PM
Unknown Object (File)
Mon, Feb 2, 11:59 AM
Unknown Object (File)
Mon, Feb 2, 8:32 AM
Subscribers

Details

Reviewers
markj
olce
Summary
The modifier executes the operation using msr{read,write}_safe()
functions instead of plain msr reads and writes.  Returns EFAULT if any
MSR access caused #GP.

x86_msr_op(9): consistently return the value read from MSR

x86_msr_op_one((9): execute SCHED ops in critical section

x86: provide extended description for x86_msr_op(9)

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib edited the summary of this revision. (Show Details)
kib added reviewers: markj, olce.

finalize the change

Minor clarifications to the description.

I do not think that converting the comment into a man page is warranted.

Execute SCHED ops in critical section

kib edited the summary of this revision. (Show Details)

Fix locking error with sched_bind()

I'm again not sure if this is the direction we want to go. As x86_msr_op() is about a single (read or read-modify-write) operation, it has a very short execution time. Not only using sched_bind() seems a big hammer for that (it has higher overhead), but also depending on the priority of the calling thread execution might be delayed a significant amount of time (with a real time thread already executing there, possibly indefinitely). To me, and IIRC one of our discussions for markj@ as well, the direction was more to deprecate using sched_bind() for this purpose, eventually retiring all MSR_OP_SCHED_* variants, whereas that patch *adds* new ones.

And, for a safe variant to work with MSR_OP_RENDEZVOUS*, then there's no other choice than to be able to take a fault on MSR access while interrupts are disabled, so we are back to D54996's discussion. I'm not sure why you mentioned GSBASE in the discussion; I read some of the interrupt handling assembly code long ago, and although I don't recall well I thought that the dances around this were already performed there, before we even reach trap(). And why faulting from controlled addresses, as is the case for rdmsr_safe() or wrmsr_safe(), even with interrupts disabled, would be wrong?

This discussion is really wider than just the MSR_OP_* topic, as we are already using *msr_safe() in callbacks triggered by smp_rendezvous() in at least hwpstate_amd(4).

In addition, for the potential use cases I'm seeing in hwpstate_amd(4) (but don't want to insist too much on this example, since we are using custom handlers which are probably not replaceable: We are rarely accessing a single MSR at a time, and when we modify a single one, we start from some cached value that we want to access from the target CPU; additionally, execution on some target CPU with interrupts disabled is also used as a mean to atomically modify the softc (a critical section would suffice though)), I'd like to:

  • Use the rdmsr_safe() and wrmsr_safe() variants always.
  • When specifying a read-modify-write operation, on failure, know if it's the read or only the write that failed (this may prove important for MSR that do not tolerate some values).

It's really a good idea that it is possible to retrieve back the MSR value before it is modified. It would be a requirement for hwpstate_amd(4).

I'm again not sure if this is the direction we want to go. As x86_msr_op() is about a single (read or read-modify-write) operation, it has a very short execution time. Not only using sched_bind() seems a big hammer for that (it has higher overhead), but also depending on the priority of the calling thread execution might be delayed a significant amount of time (with a real time thread already executing there, possibly indefinitely). To me, and IIRC one of our discussions for markj@ as well, the direction was more to deprecate using sched_bind() for this purpose, eventually retiring all MSR_OP_SCHED_* variants, whereas that patch *adds* new ones.

Yes, IMO it is almost always wrong to use sched_bind() for short sections of code. I lost count of the number of times I had to debug problems caused by this pattern, but the first one was related to commit e42412a96695d0e9116de5c6ac75823d2711218d. For similar reasons, quiesce_all_cpus() should never be used either.

In D54996 there is a claim that FreeBSD is a realtime kernel, but this is not true in general. If a user thread binds to a specific CPU, there is no guarantee it will make progress. So if we want to have sysctl handlers which toggle unsafe MSRs, then either:

  • we should tolerate #GP with interrupts disabled (which we already do I believe), or
  • we need the scheduler to provide a hard guarantee that a thread bound to a specific CPU will be scheduled "soon", even if there is some high-priority realtime-class thread running on the CPU
sys/x86/x86/cpu_machdep.c
285

I don't really understand--why not? Can't we handle #GP with interrupts disabled?