I'd like to make sure the its_sc issue doesn't get lost and that we end up with a broader description of the path forward in here in a comment. Pick up the discussion on a mailing list perhaps?

sys/arm64/arm64/gic_v3.c
374–377 ↗	(On Diff #7647)	Do these constants have `#define`s?

andrew added inline comments.Aug 12 2015, 1:33 PM

sys/arm64/arm64/gic_v3.c
402 ↗	(On Diff #7647)	Magic
414–417 ↗	(On Diff #7647)	Are there macros for these shifts?
sys/arm64/arm64/gic_v3_its.c
680 ↗	(On Diff #7647)	Can we limit this to cpus in all_cpus?
698 ↗	(On Diff #7647)	Could you use one of the atomic_thread_fence_ functions here?

@andrew, are you happy with this except the four comments above?

wma_semihalf.com updated this revision to Diff 8000.Aug 17 2015, 8:19 AM

Closed by commit rS286919: Add SMP support to GICv3 and ITS drivers (authored by zbb). · Explain WhyAug 19 2015, 10:37 AM

This revision was automatically updated to reflect the committed changes.

I don't see any reason to use atomic_thread_fence_xyz() in place of wmb()/rmb().
Read/write barriers are much more intuitive and atomic_thread_fence_xyz() is not used anywhere anyway (why was it even added?)

In D3299#69832, @zbb wrote:

I don't see any reason to use atomic_thread_fence_xyz() in place of wmb()/rmb().
Read/write barriers are much more intuitive and atomic_thread_fence_xyz() is not used anywhere anyway (why was it even added?)

FreeBSD (tries to) use the C11 memory model. Due to the history, it was percieved as the ia64 model, and then morphed int the reference to C11. The main advantage over the *mb() use is the existence of the formally and precisely defined semantic by C++11/C11 standards, and existence of large amount of the introductory and research papers using the model, done by very competent people.

Note that _acq/_rel sync/w semantic has exact match with the locks semantic for memory model, and our locks are coded by using the acq/rel barriers in the atomics updating the lock word.

Also, I would not call *mb model intuitive by any means. It is usually understood by referring to the machine barriers instructions, so there are often fine details which differ between different arches (what does rmb() on x86 really do ? Should be simple).

Several arches started providing direct support to the C11 model in the instruction set, e.g. arm8, or very recent powerpc.

In D3299#69842, @kib wrote:

In D3299#69832, @zbb wrote:

I don't see any reason to use atomic_thread_fence_xyz() in place of wmb()/rmb().
Read/write barriers are much more intuitive and atomic_thread_fence_xyz() is not used anywhere anyway (why was it even added?)

FreeBSD (tries to) use the C11 memory model. Due to the history, it was percieved as the ia64 model, and then morphed int the reference to C11. The main advantage over the *mb() use is the existence of the formally and precisely defined semantic by C++11/C11 standards, and existence of large amount of the introductory and research papers using the model, done by very competent people.

Note that _acq/_rel sync/w semantic has exact match with the locks semantic for memory model, and our locks are coded by using the acq/rel barriers in the atomics updating the lock word.

Also, I would not call *mb model intuitive by any means. It is usually understood by referring to the machine barriers instructions, so there are often fine details which differ between different arches (what does rmb() on x86 really do ? Should be simple).

Several arches started providing direct support to the C11 model in the instruction set, e.g. arm8, or very recent powerpc.

Thank you for the clarification. I will need to get acquainted with this topic and I will keep this in mind.

In this particular situation it was not intuitive for me for few reasons. Firstly, we are not performing any atomic operation here. The acquire, release has nothing to do with what we want to achieve (especially in terms of ARM architecture for which the exclusive monitor marks memory regions as exclusive while doing atomic load(acquire) and makes it open on atomic store(release)). In our case not only we did not do atomic operation but also the order was reversed - atomic_store_rel before atomic_load_acq.

Maybe this makes sense in terms of C11 model but that is why I will need to read more about it.

In D3299#70087, @zbb wrote:

In this particular situation it was not intuitive for me for few reasons. Firstly, we are not performing any atomic operation here. The acquire, release has nothing to do with what we want to achieve (especially in terms of ARM architecture for which the exclusive monitor marks memory regions as exclusive while doing atomic load(acquire) and makes it open on atomic store(release)). In our case not only we did not do atomic operation but also the order was reversed - atomic_store_rel before atomic_load_acq.

The load linked/store conditional, which are the primitives used to implement RMW atomics on most RISC architectures, are not directly related to the barriers indeed. The fact that all architectures supported by FreeBSD provide atomic (as in, no values could appear from the thin air) loads and stores often is taken implicitely. C11 is more allowing in this regard, only requiring that no teared down values appear in the variables which have real atomic type and accessed using atomic_* family of functions.

There is very detailed discussion of both arm and powerpc memory model, including explanation of the (non-existent) interaction between barriers and ll/sc, in https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf .

From very quick look at the patch, you already have the code arranged to express the sync/w relation. From your comment in the code, it seems that you should finish lpi_alloc_cpu_pendtables() with atomic_store_rel(pend_base), and
start the lpi_config_cpu() with load_acq from pend_base. This would provide you the guarantee that if lpi_config_cpu() seen the store from lpi_alloc_cpu_pendtables() to pend_base, it also see all previous updates from lpi_alloc_cpu_pendtables(). Is this what you wanted from the use of rmb/wmb pair ?