Differential D28821 Diff 84594 sys/kern/kern_rmlock.c

Changeset View

Standalone View

sys/kern/kern_rmlock.c

Show First 20 Lines • Show All 356 Lines • ▼ Show 20 Lines	_rm_rlock_hard(struct rmlock rm, struct rm_priotracker tracker, int trylock)
if (!CPU_ISSET(pc->pc_cpuid, &rm->rm_writecpus)) {		if (!CPU_ISSET(pc->pc_cpuid, &rm->rm_writecpus)) {
critical_exit();		critical_exit();
return (1);		return (1);
}		}

/* Remove our tracker from the per-cpu list. */		/* Remove our tracker from the per-cpu list. */
rm_tracker_remove(pc, tracker);		rm_tracker_remove(pc, tracker);

/* Check to see if the IPI granted us the lock after all. */		/*
		* Check to see if the IPI granted us the lock after all. The load of
		* rmp_flags must happen after the tracker is removed from the list.
		*/
		__compiler_membar();
if (tracker->rmp_flags) {		if (tracker->rmp_flags) {
		rlibbyUnsubmitted Not Done Inline Actions Why don't we need atomic semantics for this? Because the IPI stores are from the same CPU? Should we comment something to that effect? Anyway if that's the case, maybe this should just be a volatile load or atomic_load_int()? I don't think we care about any other ordering except that with the earlier volatile store to the per-CPU list pointer (e.g. we don't care when rm->lock_object.lo_flags is loaded). rlibby: Why don't we need atomic semantics for this? Because the IPI stores are from the same CPU?
		markjAuthorUnsubmitted Done Inline Actions Right, we assume that a plain load is atomic with respect to the current CPU. That assumption is made all over the kernel. I can expand the comment. Are you suggesting that we should use an atomic_load_int() instead of the compiler_membar()? I think that would fix the problem as well, since as you noted the queue linkage pointers are declared volatile, and using atomic_load_int() would prevent reordering of the (volatile) stores with the load of rmp_flags. I would also wrap accesses of the linkage fields with atomic_store_ptr() in that case. I noticed that the read path uses compiler_membar() elsewhere in the read path, so I went with that here. markj: Right, we assume that a plain load is atomic with respect to the current CPU. That assumption…
		rlibbyUnsubmitted Not Done Inline Actions Well, first, I think both work and I don't want to insist on the style so I'll leave it to your judgement. I'd note that the `__compiler_membar()`s in `_rm_rlock()` are really just hand-inlining of `critical_enter()`/`critical_exit()` (without preemption check). Personally I think given the use of volatile for the list pointers then `atomic_load_int()` makes a little more sense to me here, as a reader. Equally I think this could be rewritten with `__compiler_membar()` instead of volatile for the list pointers. rlibby: Well, first, I think both work and I don't want to insist on the style so I'll leave it to your…
/* Just add back tracker - we hold the lock. */		/* Just add back tracker - we hold the lock. */
rm_tracker_add(pc, tracker);		rm_tracker_add(pc, tracker);
		kibUnsubmitted Not Done Inline Actions But atomic load without fence does not provide any 'after' guarantees. It might change the compiler output but this is coincidental. Also, this only changes the processor ordering, not what other threads could observe, unless I miss something. kib: But atomic load without fence does not provide any 'after' guarantees. It might change the…
		markjAuthorUnsubmitted Done Inline Actions Note that the problem here involves synchronization with an IPI on the same CPU. The use of atomic_load_int() ensures that the load will not be reordered by the compiler with the stores in rm_tracker_remove(), since the queue linkage fields are declared volatile. This combination ensures that the load and stores will not be reordered by the compiler. I think this is actually more explicit with __compiler_membar(). It is not very common to use atomic_* to enforce compiler ordering constraints. But see the earlier comments below. markj: Note that the problem here involves synchronization with an IPI on the same CPU. The use of…
		kibUnsubmitted Not Done Inline Actions Well, no, atomic_load/store without fences are relaxed and are not guaranteed to be reordered by compiler if accessing different locations. I do not think that gcc semantic for volatile t * provides any program order guarantees either, but even if it was, it is only a detail of our atomic_load() implementation. So the semantic needed is exactly like C11 atomic_signal_fence(). Might be we should put that into atomic_common.h? diff --git a/sys/sys/atomic_common.h b/sys/sys/atomic_common.h index 48f0a8b8939c..11301b5a469f 100644 --- a/sys/sys/atomic_common.h +++ b/sys/sys/atomic_common.h @@ -78,4 +78,6 @@ #define atomic_load_consume_ptr(p) \ ((__typeof(p)) atomic_load_acq_ptr((uintptr_t )p)) +#define atomic_signal_fence() __compiler_membar() + #endif Might be, we should call it atomic_interrupt_fence() instead (also avoiding conflict with C11). kib: Well, no, atomic_load/store without fences are relaxed and are not guaranteed to be reordered…
		markjAuthorUnsubmitted Done Inline Actions The compiler must not reorder volatile accesses, even to different locations, and here I'm relying on the implementation property of atomic_load_int() that it is a volatile load. That combined with the fact that `rmp_cpuQueue.rmq_(next\|prev)` are defined as volatile is sufficient to give the correct ordering, I believe. But as you note this is an abuse of implementation details, so the original patch was preferable. I will revert to `__compiler_membar()` and commit. I would prefer to add atomic_interrupt_fence() separately to simplify MFCs, and since there are other uses of `__compiler_membar()` in the file (see rms_int_membar). atomic_interrupt_fence() seems like the right name for the kernel. markj: The compiler must not reorder volatile accesses, even to different locations, and here I'm…
critical_exit();		critical_exit();
return (1);		return (1);
}		}

/*		/*
* We allow readers to acquire a lock even if a writer is blocked if		* We allow readers to acquire a lock even if a writer is blocked if
* the lock is recursive and the reader already holds the lock.		* the lock is recursive and the reader already holds the lock.
*/		*/
▲ Show 20 Lines • Show All 872 Lines • Show Last 20 Lines