Changeset View
Standalone View
sys/kern/kern_rmlock.c
Show First 20 Lines • Show All 356 Lines • ▼ Show 20 Lines | _rm_rlock_hard(struct rmlock *rm, struct rm_priotracker *tracker, int trylock) | ||||
if (!CPU_ISSET(pc->pc_cpuid, &rm->rm_writecpus)) { | if (!CPU_ISSET(pc->pc_cpuid, &rm->rm_writecpus)) { | ||||
critical_exit(); | critical_exit(); | ||||
return (1); | return (1); | ||||
} | } | ||||
/* Remove our tracker from the per-cpu list. */ | /* Remove our tracker from the per-cpu list. */ | ||||
rm_tracker_remove(pc, tracker); | rm_tracker_remove(pc, tracker); | ||||
/* Check to see if the IPI granted us the lock after all. */ | /* | ||||
* Check to see if the IPI granted us the lock after all. The load of | |||||
* rmp_flags must happen after the tracker is removed from the list. | |||||
*/ | |||||
__compiler_membar(); | |||||
if (tracker->rmp_flags) { | if (tracker->rmp_flags) { | ||||
rlibby: Why don't we need atomic semantics for this? Because the IPI stores are from the same CPU? | |||||
Done Inline ActionsRight, we assume that a plain load is atomic with respect to the current CPU. That assumption is made all over the kernel. I can expand the comment. Are you suggesting that we should use an atomic_load_int() instead of the compiler_membar()? I think that would fix the problem as well, since as you noted the queue linkage pointers are declared volatile, and using atomic_load_int() would prevent reordering of the (volatile) stores with the load of rmp_flags. I would also wrap accesses of the linkage fields with atomic_store_ptr() in that case. I noticed that the read path uses compiler_membar() elsewhere in the read path, so I went with that here. markj: Right, we assume that a plain load is atomic with respect to the current CPU. That assumption… | |||||
Not Done Inline ActionsWell, first, I think both work and I don't want to insist on the style so I'll leave it to your judgement. I'd note that the __compiler_membar()s in _rm_rlock() are really just hand-inlining of critical_enter()/critical_exit() (without preemption check). Personally I think given the use of volatile for the list pointers then atomic_load_int() makes a little more sense to me here, as a reader. Equally I think this could be rewritten with __compiler_membar() instead of volatile for the list pointers. rlibby: Well, first, I think both work and I don't want to insist on the style so I'll leave it to your… | |||||
/* Just add back tracker - we hold the lock. */ | /* Just add back tracker - we hold the lock. */ | ||||
rm_tracker_add(pc, tracker); | rm_tracker_add(pc, tracker); | ||||
Not Done Inline ActionsBut atomic load without fence does not provide any 'after' guarantees. It might change the compiler output but this is coincidental. Also, this only changes the processor ordering, not what other threads could observe, unless I miss something. kib: But atomic load without fence does not provide any 'after' guarantees. It might change the… | |||||
Done Inline ActionsNote that the problem here involves synchronization with an IPI on the same CPU. The use of atomic_load_int() ensures that the load will not be reordered by the compiler with the stores in rm_tracker_remove(), since the queue linkage fields are declared volatile. This combination ensures that the load and stores will not be reordered by the compiler. I think this is actually more explicit with __compiler_membar(). It is not very common to use atomic_* to enforce compiler ordering constraints. But see the earlier comments below. markj: Note that the problem here involves synchronization with an IPI on the same CPU. The use of… | |||||
Not Done Inline ActionsWell, no, atomic_load/store without fences are relaxed and are not guaranteed to be reordered by compiler if accessing different locations. I do not think that gcc semantic for volatile t * provides any program order guarantees either, but even if it was, it is only a detail of our atomic_load() implementation. So the semantic needed is exactly like C11 atomic_signal_fence(). Might be we should put that into atomic_common.h? diff --git a/sys/sys/atomic_common.h b/sys/sys/atomic_common.h index 48f0a8b8939c..11301b5a469f 100644 --- a/sys/sys/atomic_common.h +++ b/sys/sys/atomic_common.h @@ -78,4 +78,6 @@ #define atomic_load_consume_ptr(p) \ ((__typeof(*p)) atomic_load_acq_ptr((uintptr_t *)p)) +#define atomic_signal_fence() __compiler_membar() + #endif Might be, we should call it atomic_interrupt_fence() instead (also avoiding conflict with C11). kib: Well, no, atomic_load/store without fences are relaxed and are not guaranteed to be reordered… | |||||
Done Inline ActionsThe compiler must not reorder volatile accesses, even to different locations, and here I'm relying on the implementation property of atomic_load_int() that it is a volatile load. That combined with the fact that rmp_cpuQueue.rmq_(next|prev) are defined as volatile is sufficient to give the correct ordering, I believe. But as you note this is an abuse of implementation details, so the original patch was preferable. I will revert to __compiler_membar() and commit. I would prefer to add atomic_interrupt_fence() separately to simplify MFCs, and since there are other uses of __compiler_membar() in the file (see rms_int_membar). atomic_interrupt_fence() seems like the right name for the kernel. markj: The compiler must not reorder volatile accesses, even to different locations, and here I'm… | |||||
critical_exit(); | critical_exit(); | ||||
return (1); | return (1); | ||||
} | } | ||||
/* | /* | ||||
* We allow readers to acquire a lock even if a writer is blocked if | * We allow readers to acquire a lock even if a writer is blocked if | ||||
* the lock is recursive and the reader already holds the lock. | * the lock is recursive and the reader already holds the lock. | ||||
*/ | */ | ||||
▲ Show 20 Lines • Show All 872 Lines • Show Last 20 Lines |
Why don't we need atomic semantics for this? Because the IPI stores are from the same CPU? Should we comment something to that effect?
Anyway if that's the case, maybe this should just be a volatile load or atomic_load_int()? I don't think we care about any other ordering except that with the earlier volatile store to the per-CPU list pointer (e.g. we don't care when rm->lock_object.lo_flags is loaded).