Differential D1963

reduce cache coherence traffic on br_ring accesses
AbandonedPublic
Actions

Authored by kmacy on Feb 24 2015, 10:14 PM.

Details

Reviewers

Summary

Short of creating separate buf_rings per-package there is no way to avoid a steady stream of coherence traffic on br_prod updates. By definition many threads are simultaneously trying to acquire an index by updating it. However, once a producer has a unique index there is no intrinsic cache line sharing with other producers. With the current implementation if thread A is on package 1 and thread B is on package 2 and they're both producing a steady stream of updates br_ring[] will change ownership CACHE_LINE_SIZE/sizeof(void *) times for each cache line. If instead we pad out each entry to be CACHE_LINE_SIZE this ping-ponging can be avoided entirely.

The motivation for this change is that, at least on some architectures such as AMD's HyperTransport, the number of coherence messages per second is lower than the speed of the link might imply. In other words, although there is very low latency, the actual bandwidth is not that high.

Because this clearly explodes the size of the ring by a factor of CACHE_LINE_SIZE/sizeof(void *) I'm merely putting this out there and am not (currently) championing it. I seek (informed) commentary.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

kmacy updated this revision to Diff 3967.Feb 24 2015, 10:14 PM

kmacy retitled this revision from to reduce cache coherence traffic on br_ring accesses.

kmacy updated this object.

kmacy edited the test plan for this revision. (Show Details)

kmacy added a reviewer: imp.

kmacy added subscribers: andrew, rpaulo, zbb and 4 others.

kmacy abandoned this revision.Aug 30 2017, 11:08 PM

Revision Contents
Changeset List

Path

Size

sys/

buf_ring.h

34 lines

Diff 3967

View Options

reduce cache coherence traffic on br_ring accessesAbandonedPublicActions