intrng: always use atomic increment for interrupt counter
AbandonedPublic
Actions

Authored by ehem_freebsd_m5p.com on Jun 26 2022, 2:38 AM.

Details

Reviewers

markj
mmel
skra

Summary

Could be atomic operations were expensive in the past, yet now the
branch is likely more expensive. As such simply always use atomic add
for counter increment.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 46123
Build 43012: arc lint + arc unit

Event Timeline

ehem_freebsd_m5p.com created this revision.Jun 26 2022, 2:38 AM

Herald added a subscriber: imp. · View Herald TranscriptJun 26 2022, 2:38 AM

ehem_freebsd_m5p.com requested review of this revision.Jun 26 2022, 2:38 AM

Harbormaster completed remote builds in B46119: Diff 107392.Jun 26 2022, 2:39 AM

This looks odd. This looks really odd. True, use of atomics might merely be required for PPI, but I'm wondering why one would bother to avoid atomics for non-PPI. Are there some processors where the atomic add is really expensive? Otherwise it seems more sensible to simply blanket use atomic add.

This really should be done for the stray counter too.

Harbormaster completed remote builds in B46123: Diff 107406.Jun 26 2022, 4:37 PM

In D35602#807411, @ehem_freebsd_m5p.com wrote:

This looks odd. This looks really odd. True, use of atomics might merely be required for PPI, but I'm wondering why one would bother to avoid atomics for non-PPI. Are there some processors where the atomic add is really expensive? Otherwise it seems more sensible to simply blanket use atomic add.

Isn't non-PPI the common case? It makes sense to me to want to avoid atomics when possible, then. A related problem is that interrupt counters are allocated from a single array of longs, so it's possible for counters for distinct interrupts on different cores to reside in the same cache line. In this case, LL/SC atomics could be quite expensive if different cores are contending for ownership of the cache line.

I'm skeptical that unconditional use of atomics is preferable to what's there today. I don't have numbers, but I know that counter(9)'s use of atomics to implement per-CPU counters on arm/arm64 has a non-negligible cost (in the sense that some networking benchmarks benefit from removing packet counters). A possible compromise, which admittedly will require more work, would be to use counter(9) for interrupt counters. Then, a counter increment will involve an atomic operation on some platforms, but at least the affected cache line will be private to the CPU, and you wouldn't need a branch. This approach would be welcome on x86 as well, I think.

Ah ha. Nothing I'd seen had referenced this, notably neither 686450c8984c nor bff6be3e9b2c. That does make sense.

What seems to going on is the intrcnt/intrnames interface simply seems badly inadequate now. The counters should really be attached to either the intr_event structure, or the various architecture "src" structures. Spreading them out would likely push them onto distinct cache lines and fix rather a lot of issues (see D35608).

Abandoning this, D36610 gets the issue of spreading the interrupt counters apart and likely onto separate cache lines.

Revision Contents
Changeset List

Path

Size

sys/

kern/

subr_intr.c

7 lines

Diff 107406

View Options

intrng: always use atomic increment for interrupt counterAbandonedPublicActions