Use an atomic operation with a memory barrier loading br_cons_tail
from the producer thread and storing to it in the consumer thread.
On dequeue we need to read the pointer value from the buf_ring before
moving the consumer tail as that indicates the entry is available to be
used. The store release atomic operation guarantees this.
In the enqueueing thread we then need to use a load acquire atomic
operation to ensure writing to this entry can only happen after the
tail has been read and checked.
Reported by: Ali Saidi <alisaidi@amazon.com>
Co-developed by: Ali Saidi <alisaidi@amazon.com>
Sponsored by: Arm Ltd