so it is visible to a CPU which might pick up the thread for execution.
On Intel, all writes, in particular write-combining store buffers, are flushed by a locked operation, so the thread lock taken on the switch part should be enough. On some AMD models, the APM is self-contradictory: one place states that the locked operation is enough, but the description of CLFLUSH is explicit to mention that on models without CLFLUSHOPT only MFENCE would do it.