Page MenuHomeFreeBSD

amd64: clean up cpu_switch.S
ClosedPublic

Authored by mjg on Mar 12 2019, 3:34 PM.

Details

Summary
  • LK macro (conditional on SMP for the lock prefix) is unused
  • SETLK unnecessarily performs xchg. obtained value is never used and the implicit lock prefix adds avoidable cost. there is no correctness issues here either due to ordering guarantees
  • waiting for blocked_lock to clear first reads the lock word, pauses unconditionally and only then tests for the lock to see if it needs to loop. i don't know if looping is possible with current code, at least in my tests with poudriere there were no instances of such looping occurring. the least which can be done here is move the unlikely (if possible) case out
Test Plan

poudriere, buildworld etc.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mjg created this revision.Mar 12 2019, 3:34 PM
mjg edited the summary of this revision. (Show Details)Mar 12 2019, 3:35 PM
mjg added a reviewer: jeff.
kib added inline comments.Mar 12 2019, 4:29 PM
sys/amd64/amd64/cpu_switch.S
141 ↗(On Diff #54987)

My belief is that we 'unlock'/release the old thread there, and 'lock'/acquire the new thread' lock. It is not true acquire because the lock is distributed, the ownership was taken by other CPU. But this is very close.

With xchg->mov change, we loose the acquire semantic. Note that for actual spinlocks we release by plain mov, but acquire with LOCK.

I do not think that lack of acquire changes anything for the context switch itself, but could it affect the thread code executed after the switch ? It is possible to get completely lock-less/serialization-less path through cpu_switch with this patch.

kib added inline comments.Mar 12 2019, 4:40 PM
sys/amd64/amd64/cpu_switch.S
141 ↗(On Diff #54987)

See r334851 for real-world issue identical to that, but of course on much more relaxed arch. I do not claim that the same problem would exist on ordered x86, but the case is very similar.

mjg added a comment.Aug 28 2019, 2:54 AM

So I stand by the patch modulo the xchg change. I can re-upload if it makes a difference. Any comments regarding the rest?

sys/amd64/amd64/cpu_switch.S
141 ↗(On Diff #54987)

I agree this removes a barrier, but I don't see why one would be needed here. All stores will happen in the same order an the only reads which can possibly leak up are fine. I don't think it's an important enough bit to fight on and if it saves a possibly very esoteric bug I can just drop this part.

489 ↗(On Diff #54987)

I added an atomic counter here and verified this loop indeed gets executed, albeit incredibly rarely.

kib accepted this revision.Aug 28 2019, 10:25 AM
kib added inline comments.
sys/amd64/amd64/cpu_switch.S
141 ↗(On Diff #54987)

I think that we will have to add eg. LFENCE there due to hypothetical speculative mitigations. Commit now as is, we will cope if needed.

This revision is now accepted and ready to land.Aug 28 2019, 10:25 AM
This revision was automatically updated to reflect the committed changes.