Add SMR_LIST_* macros.
Needs ReviewPublic
Actions

Authored by markj on Mar 9 2020, 11:41 PM.

Details

Reviewers

jeff
rlibby
mjg
kib

Summary

This is a basic reimplementation of LIST_* which ensures that list
element updates are visible before the store which adds a new element to
a list. I am posting it mostly for discussion; there are some
outstanding issues:

No SLIST/STAILQ yet. I will write these if the approach taken here seems reasonable.
There is only one LIST_FOREACH implementation, meaning that writers are unnecessarily issuing acquire barriers. We could add reader/writer variants, or add an enum to indicate the sense.
I have not convinced myself that the acquire barriers are necessary to begin with. The release store of the linkage pointer in LIST_INSERT_* should synchronize with the acquire in smr_enter(). Similarly, SMR_LIST_REMOVE shouldn't require barriers.
I did not add assertion fields to all macros yet.
I added a hack to allow the use of a void expression in the assertion parameter. e.g., I want to be able to write SMR_LIST_REMOVE(..., mtx_assert(MA_OWNED)). It seems a bit fragile to me though.

I ported D23913 to use the new macros and the changes weren't too bad.

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 29857
Build 27679: arc lint + arc unit

Event Timeline

markj created this revision.Mar 9 2020, 11:41 PM

Harbormaster completed remote builds in B29857: Diff 69341.Mar 9 2020, 11:42 PM

markj added reviewers: jeff, rlibby, mjg, kib.Mar 9 2020, 11:42 PM

jeff added inline comments.Mar 9 2020, 11:50 PM

sys/sys/_smr.h
46	Is there a functional change here or is this style?
sys/sys/smr_types.h
149	I would really like to carry forward the asserts.

CK_* macros are basically queue.sh + augmentation for SMP. They also happen to be a known quantity so to speak and preferably smr would just wrap them and add smr-specific asserts. I don't know if this will pose any problems due to type difference. Should wrapping not work out I think the most pragmatic take would be to steal said macros, s/CK/SMR/ and point at the original stating this renames and adds smr specific stuff but is otherwise equivalent.

As a bare minimum the hand rolled code should match fences 1:1, even if it so happens something somewhere is overzealous.

markj added inline comments.Mar 9 2020, 11:56 PM

sys/sys/_smr.h
46	This one is just style. There are two functional changes to this file: I added _smr_ex_eval() I stopped printing the assertion expression in SMR_ASSERT(). With larger assertions this gets super ugly, one of the downsides of _smr_ex_eval(). I think the file and line should be sufficient though, so I'm on the fence about it.
sys/sys/smr_types.h
149	Ok. I will add internal variants of some of the macros so that e.g., SMR_FOREACH does not repeatedly evaluate the assertion expression. I think that might be too expensive even for INVARIANTS kernels.

In D24010#527924, @mjg wrote:

CK_* macros are basically queue.sh + augmentation for SMP. They also happen to be a known quantity so to speak and preferably smr would just wrap them and add smr-specific asserts. I don't know if this will pose any problems due to type difference. Should wrapping not work out I think the most pragmatic take would be to steal said macros, s/CK/SMR/ and point at the original stating this renames and adds smr specific stuff but is otherwise equivalent.

As a bare minimum the hand rolled code should match fences 1:1, even if it so happens something somewhere is overzealous.

CK macros also depend on the CK atomics API. I would prefer that we use atomic(9) everywhere. It reduces the cognitive burden on someone reading code, and it means that all of our code benefits when someone improves atomic(9). For example, on armv8.1 atomic(9) uses LSE instructions, but anything using CK does not benefit from that.

Well that puts CK itself in a rather peculiar spot, what are the expectations of current CK users? Perhaps this would make an argument for replacing internal CK atomics with FreeBSD ones.

That said, if you insist on custom implementation, the least we can do is stick to what CK is doing -- that is take it, rename, stuff, add asserts and point people at the original. Cursory comparison shows that the implementation already differs and I don't know if we really want to spend time validating it's fine, especially this early.

In D24010#527929, @mjg wrote:

Well that puts CK itself in a rather peculiar spot, what are the expectations of current CK users?

What do you mean? They can continue using CK unless they are converted to use the SMR KPIs. I think the existing code would benefit from some auditing and annotation to indicate exactly which accesses are protected by a given net epoch section.

Perhaps this would make an argument for replacing internal CK atomics with FreeBSD ones.

Assuming that they can all be mapped cleanly, that would address one of my complaints but not the other, which is that having two KPIs for atomics is confusing.

That said, if you insist on custom implementation, the least we can do is stick to what CK is doing -- that is take it, rename, stuff, add asserts and point people at the original. Cursory comparison shows that the implementation already differs and I don't know if we really want to spend time validating it's fine, especially this early.

It probably can't cleanly be done. The _FOREACH macros have some lexical constraints that mean that you can't just add an assertion statement before the loop.

The implementation is simple enough and I don't think it really needs careful review. The only detail that I am unsure about is the use of acquire loads, and CK's implementation isn't clear to me. For instance, these loops are different in that the first issues load fences:

CK_LIST_FOREACH_SAFE(v, &list, field, tmp)
    free(v);

while ((v = CK_LIST_FIRST(&list)) != NULL) {
    CK_LIST_REMOVE(v);
    free(v);
}

... but I can't see why that should be.

In D24010#527930, @markj wrote:

In D24010#527929, @mjg wrote:

Well that puts CK itself in a rather peculiar spot, what are the expectations of current CK users?

What do you mean? They can continue using CK unless they are converted to use the SMR KPIs. I think the existing code would benefit from some auditing and annotation to indicate exactly which accesses are protected by a given net epoch section.

Well I meant should new atomics land, they wont use them and that involves important code in the network stack.

Perhaps this would make an argument for replacing internal CK atomics with FreeBSD ones.

Assuming that they can all be mapped cleanly, that would address one of my complaints but not the other, which is that having two KPIs for atomics is confusing.

That said, if you insist on custom implementation, the least we can do is stick to what CK is doing -- that is take it, rename, stuff, add asserts and point people at the original. Cursory comparison shows that the implementation already differs and I don't know if we really want to spend time validating it's fine, especially this early.

It probably can't cleanly be done. The _FOREACH macros have some lexical constraints that mean that you can't just add an assertion statement before the loop.

Wrapping can indeed be a problem, but modifying should be pretty straightforward. From your own patch, you embedded the assert within for(), you can do the same thing with the code taken from CK.

The implementation is simple enough and I don't think it really needs careful review.

I have to strongly disagree on this point. Fences are notoriously tricky and unfortunately the prime testing platform -- amd64 -- is very tolerant of mistakes on this front. I don't see any reason to deviate here. It should not be hard nor time consuming to rework this on top of CK and it will provide the peace of mind here.

The only detail that I am unsure about is the use of acquire loads, and CK's implementation isn't clear to me. For instance, these loops are different in that the first issues load fences:
CK_LIST_FOREACH_SAFE(v, &list, field, tmp)
    free(v);

while ((v = CK_LIST_FIRST(&list)) != NULL) {
    CK_LIST_REMOVE(v);
    free(v);
}
... but I can't see why that should be.

Where is this code from? The second example is presumably only to be executed with locks held or when the list is no longer visible to anyone but curthread, making additional synchronisation spurious. I don't know why CK_LIST_FOREACH_SAFE has the fence, I can only speculate this was to allow for it being used in read-only paths.

I agree with mark. CK_LIST is just a copy of queue.h with ck barriers added. We want stronger assertions to tie to the smr interface. We have wide arm64 platforms available to us for testing now. I think the risk of incorrect barriers is minimal if we are conservative. The only real question is how often to require acquire load.

I think we should have read side and write side macros. It simplifies things to only require coherent forward traversal and fully serialized writers.

One advantage to running the assert on every loop iteration is that the caller could drop the lock during the foreach. write lock asserts should be cheap.

In D24010#527931, @mjg wrote:

In D24010#527930, @markj wrote:

The implementation is simple enough and I don't think it really needs careful review.

I have to strongly disagree on this point. Fences are notoriously tricky and unfortunately the prime testing platform -- amd64 -- is very tolerant of mistakes on this front. I don't see any reason to deviate here. It should not be hard nor time consuming to rework this on top of CK and it will provide the peace of mind here.

Sure, I am not saying that the fences are straightforward, they are one of the issues I listed in the review description. The bulk of the diff is just implementing a doubly linked list, which is straightforward.

The main questions I have are about deviations from CK's interface:

should we have separate FOREACH macros for readers and writers?
do we need acquire loads at all?

The only detail that I am unsure about is the use of acquire loads, and CK's implementation isn't clear to me. For instance, these loops are different in that the first issues load fences:
CK_LIST_FOREACH_SAFE(v, &list, field, tmp)
    free(v);

while ((v = CK_LIST_FIRST(&list)) != NULL) {
    CK_LIST_REMOVE(v);
    free(v);
}
... but I can't see why that should be.
Where is this code from? The second example is presumably only to be executed with locks held or when the list is no longer visible to anyone but curthread, making additional synchronisation spurious. I don't know why CK_LIST_FOREACH_SAFE has the fence, I can only speculate this was to allow for it being used in read-only paths.

It is a constructed example. Both patterns appear in the tree in lots of places. Readers can expose the same difference, so I am still uncertain about CK's use of load fences.

CK_LIST_FOREACH(v, &list, field)
    <do something>;

v = CK_LIST_FIRST(&list);
do {
    <do something>;
} while ((v = CK_LIST_NEXT(&list)) != NULL);

But my proposal does not loosen anything in terms of assertions, especially smr. In fact I explicitly noted they should be added.

To your own point, CK is queue.h + maybe some code reordering for smp + barriers in place. SMR would can be literally the same thing + SMR-specific assertions. As proposed, the code differs in how it uses fences and nobody provided a justification for said difference. I really don't see why would you want to hope to find the problem in testing when we can be significantly more likely to get the correct code by basing the work on CK.

mjg added inline comments.Mar 10 2020, 2:07 AM

sys/sys/smr_types.h
168	_next gets modified before stores to elm are guaranteed to be visible. meaning someone traversing backwards can fetch an object which is not fully initialized
176	similarly to the previous case, but this is forward traversal
186	similar problem to the above. all these cases need a release fence for elm itself. there may be more stuff lurking here. Just repurpose CK macros.

In D24010#527936, @mjg wrote:

But my proposal does not loosen anything in terms of assertions, especially smr. In fact I explicitly noted they should be added.

To your own point, CK is queue.h + maybe some code reordering for smp + barriers in place. SMR would can be literally the same thing + SMR-specific assertions. As proposed, the code differs in how it uses fences and nobody provided a justification for said difference.

The effects of these fences leak in to consumers. So the semantics and guarantees of these macros need to be well understood and documented. The main purpose of the review is to nail down exactly what those semantics are.

I really don't see why would you want to hope to find the problem in testing when we can be significantly more likely to get the correct code by basing the work on CK.

I don't expect to find bugs by testing and I'm not satisfied with "more likely to get the correct code." I would like to be certain that these macros are well-understood. If the result is exactly the same as CK, then ok, but I am still unhappy about having to keep multiple atomics APIs in my head.

sys/sys/smr_types.h
168	There is no SMR_LIST_PREV() or FOREACH_REVERSE() for exactly this reason, same as CK.

In D24010#527944, @markj wrote:

In D24010#527936, @mjg wrote:

But my proposal does not loosen anything in terms of assertions, especially smr. In fact I explicitly noted they should be added.

To your own point, CK is queue.h + maybe some code reordering for smp + barriers in place. SMR would can be literally the same thing + SMR-specific assertions. As proposed, the code differs in how it uses fences and nobody provided a justification for said difference.

The effects of these fences leak in to consumers. So the semantics and guarantees of these macros need to be well understood and documented. The main purpose of the review is to nail down exactly what those semantics are.

Then take the CK macros and translate the stuff. Fences used there directly translate to what we have. Should someone want to further change this (e.g., to stop issuing "naked" fence_rel) that would be up for discussion after the dust settles.

I really don't see why would you want to hope to find the problem in testing when we can be significantly more likely to get the correct code by basing the work on CK.

I don't expect to find bugs by testing and I'm not satisfied with "more likely to get the correct code." I would like to be certain that these macros are well-understood. If the result is exactly the same as CK, then ok, but I am still unhappy about having to keep multiple atomics APIs in my head.

I was responding to jeff, your comment was posted in-between.

mjg added inline comments.Mar 10 2020, 2:37 AM

sys/sys/smr_types.h
168	Even if going backwards is not supported this has to be explicitly stated, otherwise someone can get beaten by an attempt to implement it. Also the case below is not handled. This bug would be trivially avoided if the code was based on CK.

jeff added inline comments.Mar 10 2020, 2:48 AM

sys/sys/smr_types.h
176	The prev pointer points to the next field of the previous structure. This correctly does the release barrier on the next update.
186	The release fence on the next pointers is sufficient to guarantee that another caller doing a acquire fence on the load will see a consistent element. Please stop repeating the same feedback. We heard you and don't agree. It is now beyond redundant.

Semantics can be agreed upon as follows (this partially repeats itself to reinforce the point):

lookup:

consumer is within the relevant smr section
newly inserted objects can be missed during traversal
if a newly inserted object is found, all stores to the object which proceeded the insertion are guaranteed to be visible
deleted objects may be spuriously found, it is up for the consumer to handle it. all stores to the object which proceeded the deletion are guaranteed to be visible
it is illegal access found objects outside of the smr section unless other means of protection were put in place (e.g., reference counting)

insertion:

mutual exclusion against other writers must be handled by the consumer
insertions of new elements must be only issued after they are constructed to a point which is safe for inspection by code traversing in a smr-protected section. the insertion routine posts appropriate fences.

deletion:

mutual exclusion against other writers must be handled by the consumer
deletion order must provide the lookup code with means of detecting that destruction is underway
all stores to the object prior to issuing removal are visible

Maybe I missed something but this should do the trick. Modulo bugs this should be expected from any implementation imo.

In D24010#527950, @mjg wrote:

Semantics can be agreed upon as follows (this partially repeats itself to reinforce the point):

lookup:

consumer is within the relevant smr section

newly inserted objects can be missed during traversal

if a newly inserted object is found, all stores to the object which proceeded the insertion are guaranteed to be visible

deleted objects may be spuriously found, it is up for the consumer to handle it. all stores to the object which proceeded the deletion are guaranteed to be visible

it is illegal access found objects outside of the smr section unless other means of protection were put in place (e.g., reference counting)

insertion:

mutual exclusion against other writers must be handled by the consumer

insertions of new elements must be only issued after they are constructed to a point which is safe for inspection by code traversing in a smr-protected section. the insertion routine posts appropriate fences.

deletion:

mutual exclusion against other writers must be handled by the consumer

deletion order must provide the lookup code with means of detecting that destruction is underway

all stores to the object prior to issuing removal are visible