Page MenuHomeFreeBSD

intrng: always use atomic increment for interrupt counter
AbandonedPublic

Authored by ehem_freebsd_m5p.com on Jun 26 2022, 2:38 AM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 17 2024, 12:10 PM
Unknown Object (File)
Jan 1 2024, 6:58 AM
Unknown Object (File)
Oct 21 2023, 4:02 PM
Unknown Object (File)
Sep 4 2023, 12:55 AM
Unknown Object (File)
May 25 2023, 1:17 PM
Unknown Object (File)
Feb 17 2023, 3:07 AM
Unknown Object (File)
Jan 16 2023, 9:39 AM
Unknown Object (File)
Dec 15 2022, 7:28 AM
Subscribers

Details

Reviewers
markj
mmel
skra
Summary

Could be atomic operations were expensive in the past, yet now the
branch is likely more expensive. As such simply always use atomic add
for counter increment.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 46123
Build 43012: arc lint + arc unit

Event Timeline

This looks odd. This looks really odd. True, use of atomics might merely be required for PPI, but I'm wondering why one would bother to avoid atomics for non-PPI. Are there some processors where the atomic add is really expensive? Otherwise it seems more sensible to simply blanket use atomic add.

This really should be done for the stray counter too.

This looks odd. This looks really odd. True, use of atomics might merely be required for PPI, but I'm wondering why one would bother to avoid atomics for non-PPI. Are there some processors where the atomic add is really expensive? Otherwise it seems more sensible to simply blanket use atomic add.

Isn't non-PPI the common case? It makes sense to me to want to avoid atomics when possible, then. A related problem is that interrupt counters are allocated from a single array of longs, so it's possible for counters for distinct interrupts on different cores to reside in the same cache line. In this case, LL/SC atomics could be quite expensive if different cores are contending for ownership of the cache line.

I'm skeptical that unconditional use of atomics is preferable to what's there today. I don't have numbers, but I know that counter(9)'s use of atomics to implement per-CPU counters on arm/arm64 has a non-negligible cost (in the sense that some networking benchmarks benefit from removing packet counters). A possible compromise, which admittedly will require more work, would be to use counter(9) for interrupt counters. Then, a counter increment will involve an atomic operation on some platforms, but at least the affected cache line will be private to the CPU, and you wouldn't need a branch. This approach would be welcome on x86 as well, I think.

Ah ha. Nothing I'd seen had referenced this, notably neither 686450c8984c nor bff6be3e9b2c. That does make sense.

What seems to going on is the intrcnt/intrnames interface simply seems badly inadequate now. The counters should really be attached to either the intr_event structure, or the various architecture "src" structures. Spreading them out would likely push them onto distinct cache lines and fix rather a lot of issues (see D35608).

Abandoning this, D36610 gets the issue of spreading the interrupt counters apart and likely onto separate cache lines.