Page MenuHomeFreeBSD

kern/intr: implement intrcnt/intrnames sysctl from event counters
AbandonedPublic

Authored by ehem_freebsd_m5p.com on Sep 16 2022, 7:23 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 11, 2:54 PM
Unknown Object (File)
Sat, Jan 11, 2:16 AM
Unknown Object (File)
Thu, Jan 9, 12:48 PM
Unknown Object (File)
Sun, Jan 5, 3:48 AM
Unknown Object (File)
Sat, Dec 28, 7:27 PM
Unknown Object (File)
Fri, Dec 27, 2:26 PM
Unknown Object (File)
Fri, Dec 27, 2:07 PM
Unknown Object (File)
Fri, Dec 27, 1:51 PM

Details

Summary

This allows architectures to provide the interrupt names/counters from
intr_event, instead of the architecture tables. Since the tables are
inferior for normal interrupts, but functional for IPI interrupts this
is a substantial step forward.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 47500
Build 44387: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Updating to ensure D36610 stays synchronized with my tree. There was a small delta due to overlap with another commit.

"FreeBSD Phabricator III: The Search for Reviewers"

Two spots have multiple potential approaches and I'm unsure which to choose. Overall though this appears to have some rather nice properties since it avoids the allocation of a significant memory block. I expect this to make sysctl("hw.intrnames") use slightly more processor, but avoid consuming memory for a low-use interface.

Any word? I really would kind of like to get this in as it potentially blocks at least one device.

In its current form, this change has major problems, and is largely unreviewable. The main issue with your approach is that struct intr_event is not the correct place to stash these counters. This is evident by the fact that you are, in a few places, allocating empty (i.e. no handler) intr_event structures simply for this counter bookkeeping. This is a misuse of the structure, doing something completely outside its intended and established purpose (to execute/schedule interrupt handlers). This approach is not going to be accepted, so please set your expectations accordingly.

After some study, I agree that the intrcnt/intrnames interface is problematic. These counters belong to the machine-dependent interrupt frameworks (e.g. INTRNG), but the sysctls are implemented in the machine-independent intr_event code. This is a layering violation that forces MD code to allocate the counters and names in this inconvenient array format. An acceptable approach would be to eliminate the MI parts by moving (i.e. duplicating) the sysctl and ddb commands into each MD framework*. This will allow the implementations to evolve independently, while still maintaining the hw.intrnames/hw.intrcnt sysctl interface for userspace.

Ultimately, I believe you are searching for how to meaningfully implement isrc_release_counters(). In the INTRNG case, the counters could be stashed as part of struct intr_irqsrc, or still maintained in a dynamically-sized array, with a bitmap to track which counters are allocated. This would allow counters to be released, vs the current array+index scheme does not.

Any argument based on saved/wasted memory needs to be backed by some form of before/after measurement in order to be taken seriously.

I hope this helps.

*For the question of watchdog_fire(), I think the use of intrcnt and intrnames in that function can be dropped outright. I'm certain it was useful at one point, but since the information is available in ddb, I highly doubt anyone will miss it.

sys/arm/arm/machdep_intr.c
196

This is what I'm talking about. Allocating MAXCPU new intr_event structures only for the IPI counters is not going to fly.

In its current form, this change has major problems, and is largely unreviewable. The main issue with your approach is that struct intr_event is not the correct place to stash these counters. This is evident by the fact that you are, in a few places, allocating empty (i.e. no handler) intr_event structures simply for this counter bookkeeping. This is a misuse of the structure, doing something completely outside its intended and established purpose (to execute/schedule interrupt handlers). This approach is not going to be accepted, so please set your expectations accordingly.

struct intr_event does center on this task, but this doesn't mean peripheral portions of this task shouldn't be part of struct intr_event. Could you suggest some test for distinguishing whether a given task/field/datum should be on struct intr_event versus what I'm calling interrupt_t (structures intr_irqsrc/intsrc/powerpc_intr)?

The model I would use is to map things to pre-MSI hardware. struct intr_event would then map to the interrupt pin on the interrupting device. interrupt_t would then map to a pin on the interrupt controller. The pair together would then equate to the "interrupt".

This means ie_irq belongs on interrupt_t since it is a property of the interrupt controller and not the device (D32504). This means the counters belong on struct intr_event since they are a property of the device not the controller.
For a controller which loses all interrupt mappings under certain conditions (suspend perhaps?) and has to reestablish them, this could be handled by having an array of interrupt_t and swapping the ->isrc_event fields around (yes, there is at least one controller which needs this). This keeps the handlers, counters and name all together with minimal effort (useful for display by vmstat -i).

Needing to allocate full struct intr_event for IPI counters is wasteful. Yet perhaps more handling of IPI interrupts could be done via core functions. Quite a bit of waste could be saved by allocating counters during individual processor bring-up instead of all at once. This avoids having large intrcnt/intrnames tables which by necessity must include substantial extra space.

After some study, I agree that the intrcnt/intrnames interface is problematic. These counters belong to the machine-dependent interrupt frameworks (e.g. INTRNG), but the sysctls are implemented in the machine-independent intr_event code. This is a layering violation that forces MD code to allocate the counters and names in this inconvenient array format. An acceptable approach would be to eliminate the MI parts by moving (i.e. duplicating) the sysctl and ddb commands into each MD framework*. This will allow the implementations to evolve independently, while still maintaining the hw.intrnames/hw.intrcnt sysctl interface for userspace.

I really dislike the duplication of source code this would cause (also the interface of "hw.intrnames" is surprisingly brittle). Having played with this a bit it seems more useful to have the counters on struct intr_event. An advantage to this approach is stray interrupts which were generated inside an interrupt controller could be recorded, though nominally those shouldn't happen since the interrupt shouldn't be unmasked until after a struct intr_event was allocated.

I'm not absolutely against this approach, just the results of going the opposite direction seem rather superior to me.

Ultimately, I believe you are searching for how to meaningfully implement isrc_release_counters(). In the INTRNG case, the counters could be stashed as part of struct intr_irqsrc, or still maintained in a dynamically-sized array, with a bitmap to track which counters are allocated. This would allow counters to be released, vs the current array+index scheme does not.

Your belief is correct in spirit. Any other approach which allows intr_isrc_deregister() to not simply be a wrapper around panic() and doesn't require too much effort on my part is nominally acceptable to me.

Of those two, counters on struct intr_irqsrc would be massively superior. This would allow a simple implementation for "hw.intrcnt" whereas tracking with a bitmap or linked list would require a rather more complicated implementation.

I hope this helps.

Does. Though some more convincing is still needed.

*For the question of watchdog_fire(), I think the use of intrcnt and intrnames in that function can be dropped outright. I'm certain it was useful at one point, but since the information is available in ddb, I highly doubt anyone will miss it.

If others are okay with this I would be quite happy with this.

sys/arm/arm/machdep_intr.c
196

Effectively this is an inline implementation of intr_ipi_setup_counters(). The trade-off is this makes individual IPI counters take more space, but saves space by not having to unused space in the intrnames/intrcnt tables.

Potentially entries could be allocated during processor bring-up, so there wouldn't be unused entries. Alternatively the unused entries could be released later. Definitely a downside, but this is an initial implementation.

In its current form, this change has major problems, and is largely unreviewable. The main issue with your approach is that struct intr_event is not the correct place to stash these counters. This is evident by the fact that you are, in a few places, allocating empty (i.e. no handler) intr_event structures simply for this counter bookkeeping. This is a misuse of the structure, doing something completely outside its intended and established purpose (to execute/schedule interrupt handlers). This approach is not going to be accepted, so please set your expectations accordingly.

struct intr_event does center on this task, but this doesn't mean peripheral portions of this task shouldn't be part of struct intr_event. Could you suggest some test for distinguishing whether a given task/field/datum should be on struct intr_event versus what I'm calling interrupt_t (structures intr_irqsrc/intsrc/powerpc_intr)?

Members of struct intr_event should be primarily (with small exceptions) accessed and managed within the kern_event(9) API, that is, kern_event.c. The counters are allocated and incremented outside of this file, even in the current version of your change. This is a good indicator that struct intr_event does not (and should not) own them.

After some study, I agree that the intrcnt/intrnames interface is problematic. These counters belong to the machine-dependent interrupt frameworks (e.g. INTRNG), but the sysctls are implemented in the machine-independent intr_event code. This is a layering violation that forces MD code to allocate the counters and names in this inconvenient array format. An acceptable approach would be to eliminate the MI parts by moving (i.e. duplicating) the sysctl and ddb commands into each MD framework*. This will allow the implementations to evolve independently, while still maintaining the hw.intrnames/hw.intrcnt sysctl interface for userspace.

I really dislike the duplication of source code this would cause (also the interface of "hw.intrnames" is surprisingly brittle). Having played with this a bit it seems more useful to have the counters on struct intr_event. An advantage to this approach is stray interrupts which were generated inside an interrupt controller could be recorded, though nominally those shouldn't happen since the interrupt shouldn't be unmasked until after a struct intr_event was allocated.

I'm not absolutely against this approach, just the results of going the opposite direction seem rather superior to me.

Well, then you need to weigh your desire to get this work accepted against your belief of how the code should look. I've given you my major objections.

I hope this helps.

Does. Though some more convincing is still needed.

I am not interested in spending time trying to argue over this or convince you, so I won't. It's evident to me that nobody else wants to touch these reviews, because they just grow and grow with vague questions and assertions while remaining completely non-actionable.

sys/arm/arm/machdep_intr.c
196

Any argument based on saved/wasted memory needs to be backed by some form of before/after measurement in order to be taken seriously.

You seem intent on ignoring this bit.

In its current form, this change has major problems, and is largely unreviewable. The main issue with your approach is that struct intr_event is not the correct place to stash these counters. This is evident by the fact that you are, in a few places, allocating empty (i.e. no handler) intr_event structures simply for this counter bookkeeping. This is a misuse of the structure, doing something completely outside its intended and established purpose (to execute/schedule interrupt handlers). This approach is not going to be accepted, so please set your expectations accordingly.

struct intr_event does center on this task, but this doesn't mean peripheral portions of this task shouldn't be part of struct intr_event. Could you suggest some test for distinguishing whether a given task/field/datum should be on struct intr_event versus what I'm calling interrupt_t (structures intr_irqsrc/intsrc/powerpc_intr)?

Members of struct intr_event should be primarily (with small exceptions) accessed and managed within the kern_event(9) API, that is, kern_event.c. The counters are allocated and incremented outside of this file, even in the current version of your change. This is a good indicator that struct intr_event does not (and should not) own them.

I'm unsure which way you were confused when you typed that. Since find sys/sys sys/kern -type f -print0 | xargs -0 grep -eintr_event -l | sort =>

sys/kern/kern_clock.c
sys/kern/kern_intr.c
sys/kern/subr_intr.c
sys/sys/interrupt.h
sys/sys/intr.h

I think you meant kern_intr.c as struct intr_event is unrelated to kern_event.c. This is a good point, so taking a look at what I've got. Previously non-IPI interrupts had their counters incremented in 3 places, sys/powerpc/powerpc/intr_machdep.c: powerpc_dispatch_intr(), sys/kern/subr_intr.c: intr_isrc_dispatch() and sys/x86/x86/intr_machdep.c: intr_execute_handlers().
Afterwards non-IPI counters are only incremented in sys/kern/kern_intr.c: intr_event_handle(), so by this metric the counters for non-IPI interrupts are perfectly reasonable to have in struct intr_event.

Meanwhile the increments outside of intr_event_handle() and the allocations you're objecting to are strictly for IPI interrupts. This suggests two approaches:

  1. IPI interrupts are too distinct from non-IPI interrupts and they need to be severed from each other
  2. IPI interrupt handling needs more merging with non-IPI interrupts

That really does explain all the messiness presently in D36610. The two interrupt types have been partially, but not fully merged and the result is a mess. Unfortunately this is at least partially orthogonal to the goal of D36610.
There is certainly a lot of commonality with the two types of interrupt, yet perhaps they really do need to remain separate right now.

After some study, I agree that the intrcnt/intrnames interface is problematic. These counters belong to the machine-dependent interrupt frameworks (e.g. INTRNG), but the sysctls are implemented in the machine-independent intr_event code. This is a layering violation that forces MD code to allocate the counters and names in this inconvenient array format. An acceptable approach would be to eliminate the MI parts by moving (i.e. duplicating) the sysctl and ddb commands into each MD framework*. This will allow the implementations to evolve independently, while still maintaining the hw.intrnames/hw.intrcnt sysctl interface for userspace.

I really dislike the duplication of source code this would cause (also the interface of "hw.intrnames" is surprisingly brittle). Having played with this a bit it seems more useful to have the counters on struct intr_event. An advantage to this approach is stray interrupts which were generated inside an interrupt controller could be recorded, though nominally those shouldn't happen since the interrupt shouldn't be unmasked until after a struct intr_event was allocated.

I'm not absolutely against this approach, just the results of going the opposite direction seem rather superior to me.

Well, then you need to weigh your desire to get this work accepted against your belief of how the code should look. I've given you my major objections.

I hope I've given some good responses to at least some of those objections. I'm also considering the possibilities since the above gives me a good hint as to how I would do things.

Any argument based on saved/wasted memory needs to be backed by some form of before/after measurement in order to be taken seriously.

You seem intent on ignoring this bit.

I'm not. I haven't done proper measurements, so I'm estimating and due to the nature of estimates you're allowed to equate that to a concession. I suspect it will end up in the noise floor on many systems. In the present state there is increased use for IPIs, but decreased use due to removal of all the maintenance values and the intrnames copy.

I'm unsure which way you were confused when you typed that. Since find sys/sys sys/kern -type f -print0 | xargs -0 grep -eintr_event -l | sort =>

sys/kern/kern_clock.c
sys/kern/kern_intr.c
sys/kern/subr_intr.c
sys/sys/interrupt.h
sys/sys/intr.h

I think you meant kern_intr.c as struct intr_event is unrelated to kern_event.c.

Yes, thanks. I meant kern_intr.c and intr_event(9).

Yes, thanks. I meant kern_intr.c and intr_event(9).

Except this is a non-sequitur as intr_event(9) doesn't relate to anything here.


I was actually hoping for some feedback on what I stated afterwards since I was trying to point in the general direction of the alternative I would tend to prefer.

I think you meant kern_intr.c as struct intr_event is unrelated to kern_event.c. This is a good point, so taking a look at what I've got. Previously non-IPI interrupts had their counters incremented in 3 places, sys/powerpc/powerpc/intr_machdep.c: powerpc_dispatch_intr(), sys/kern/subr_intr.c: intr_isrc_dispatch() and sys/x86/x86/intr_machdep.c: intr_execute_handlers().
Afterwards non-IPI counters are only incremented in sys/kern/kern_intr.c: intr_event_handle(), so by this metric the counters for non-IPI interrupts are perfectly reasonable to have in struct intr_event.

Meanwhile the increments outside of intr_event_handle() and the allocations you're objecting to are strictly for IPI interrupts. This suggests two approaches:

  1. IPI interrupts are too distinct from non-IPI interrupts and they need to be severed from each other
  2. IPI interrupt handling needs more merging with non-IPI interrupts

That really does explain all the messiness presently in D36610. The two interrupt types have been partially, but not fully merged and the result is a mess. Unfortunately this is at least partially orthogonal to the goal of D36610.
There is certainly a lot of commonality with the two types of interrupt, yet perhaps they really do need to remain separate right now.

What I presently have in D36610 is more hospitable of approach #2. Allocating struct intr_event for each interrupt which gets reported. If IPI handling could be readily further merged then this is better (unfortunately I'm already looking at too many issues to attack this now).


If you're that much against allocating blank struct intr_events on a temporary basis, then what do you think of approach #1?

The function sysctl_intrnames() in kern_intr.c could be renamed to intr_event_sysctl_intrnames(), made global; and then give sysctl_intrcnt() similar treatment. Then add sysctl_intrnames()/sysctl_intrcnt() implementations to the architectures which report the IPIs from architecture tables and then call intr_event_sysctl_intrnames()/intr_event_sysctl_intrcnt().

This would keep the counters on struct intr_event instead of moving them to the interrupt_t structures. This has the useful advantage of stray software interrupts get reported (there is one presently lurking out there).

Actually, my alternative may simply be removing normal interrupts from intrcnt/intrnames, moving them to struct intr_event and adjusting sysctl_intrcnt()/sysctl_intrnames() to report both locations.

WIP. Initial proof of concept implementation of my take on the suggested approach. This has been tested on ARM64, but there is at least one issue of distinct concern here.

As threatened, this is pushing IPI and non-IPI interrupt implementations further apart. Does in fact provide some nice characteristics, at the cost of reducing commonality.

Note, this requires D36628 as this tickles that bug.

There are a bunch of things here which need review. I'm also wondering whether this needs to be split into several reviews. Problem is all the pieces have so much overlap. I'm also suspicious the commits are likely to need rearrangement too.

sys/kern/subr_intr.c
149–150

I chose to match what x86 and PowerPC are using. vmstat -i is supposed to be able to retrieve this from kernel core dumps, so matching seems the way to go.

214–215

Interestingly when/if D35898 goes through, the ii_names field which I had originally been suggesting to remove could be the basis of an alternative implementation of the sysctls. This in turn would allow purging the remains of the interrupt table.

1652–1697

This is what I had suggested. Use something near-identical to existing behavior for handling reporting of IPI interrupts, then use a hook into intr_event for reporting normal interrupts. Unfortunately this requires sintrcnt/sintrnames to match the size of the valid table, which means they get large increments.

sys/powerpc/powerpc/intr_machdep.c
161–162

I'm rather suspicious about this computation for the table size. I'm unfamiliar with how the interrupt counters work on PowerPC, but based on how INTRNG behaves this is way too few.

191–192

This was clearly copied from x86. I've got no idea what purpose this questionable interrupt served and when. Since there is no documentation, I suspect it is unused now.

sys/x86/x86/intr_machdep.c
180–184

As with PowerPC, I'm suspicious about this calculation. The result seems far too small, which leaves me suspicious not enough interrupt counters have been allocated for IPI interrupts, but in the past this was mitigated by lots of spare I/O interrupt counters. Now this will likely break.

Both Xen and Hyper-V allocate a set of IPI interrupt counters, but you're rather unlikely to have both on the same hardware at the same time.

201–203

Again, the mysterious questionable interrupt. No idea, so I'm opting to believe this had value at some time in the far distant past, but is unneeded now.

332–337

Taking advantage of the new intr_event_handle() returning the stray count. Previously callers had only cared about 0 versus !0, for whether to interpret the value as a stray. Now this provides a handy way for callers to provide limited messages like this.

I guess the goal of D36610 has moved a bit. What is really needed on this is feedback on the overall approach, then parts can be broken off as things clean up.

Moving the non-IPI interrupt counters to struct intr_isrc really does seem the way to go. ie_fullname isn't going to disappear (several uses in kern_intr.c, neither PowerPC nor x86 were storing them anywhere besides intrnames) so keeping the counters next to it seems worthwhile.

This fully shares them between the architectures instead of creating a poor interface. Crucially unlike 9b33b154b53, this moves them rather than creating duplicate copies.

I'm unsure how many Phabricator Diffs this will need in the end. I'm pretty sure the first few commits/patches will be:

  1. kern/intr: replace mutex with shared-exclusive lock
  2. kern/intr: add interrupt counters to intr_isrc
  3. kern/intr: switch intr_event_handle() to return stray count

After that move sysctl_intrnames()/sysctl_intrcnt() to architecture. Then intr_event_sysctl_intrnames()/intr_event_sysctl_intrcnt(), then nuking the old table handling. I would really like to leave sysctl_intrnames()/sysctl_intrcnt() in better shape, but I doubt I need that to accomplish the goal.

This of course assumes the split is acceptable. I note INTRNG has had issues with IPI versus non-IPI interrupts. In 2016 it did both, but that got broken off. Now IPI handling may be moving back into INTRNG. This does underscore IPI interrupts being quite different from other interrupt types.

I think I've got a viable commit series for this, but I do hope for some feedback before creating 8 Phabricator diffs. As noted, placing the counters onto struct intr_event makes them architecture-independent, but causes them to split from the IPI counters. I think this is reasonable since this is moving the counters, not merely copying them (this is the error made at rG9b33b154b53).

I reviewed the current version and had some comments written up. Overall this approach is a big improvement over allocating empty intr_event structures for IPIs. Treating these distinct types of interrupts differently is sensible, so I was on board with the direction.

However, the vmstat -M core -i use-case poses a real problem for this change. This debugging feature relies on being able to read the counters from intrcnt[] and intrnames[] post-mortem. By scattering I/O interrupt counters elsewhere, it breaks this reporting functionality. Even if said feature is exceedingly niche, we can't just break userspace. It would be possible to collect interrupt counter statistics into some secondary array at panic-time, and have newer vmstat binaries read from there, but at that point I think the added complexity exceeds any tangible benefit.

So, I must say that IMO this is the final nail in the coffin for trying to make this counter array interface nicer. I think it needs to be left alone, and the inconveniences it poses need to be worked around in other ways.

sys/kern/subr_intr.c
149–150

I was not aware of this functionality... this actually puts some real limitations on how the interface can be changed. For example, moving regular I/O interrupt counts out of the intrcnt array totally breaks reading those counts from a crash dump.

I will need to think about this.

sys/x86/x86/intr_machdep.c
180–184

IPI counters are registered in mp_ipi_intrcnt(), and it seems they need to be enabled with the COUNT_IPIS kernel config option (off by default). To me the count adds up, but the MPASS assertion in intrcnt_add() should help you verify this.

201–203

I would speculate the same.

So, I must say that IMO this is the final nail in the coffin for trying to make this counter array interface nicer. I think it needs to be left alone, and the inconveniences it poses need to be worked around in other ways.

To that end I have posted D38437, which should unblock you regarding intr_isrc_deregister().

ehem_freebsd_m5p.com added a subscriber: kib.

However, the vmstat -M core -i use-case poses a real problem for this change. This debugging feature relies on being able to read the counters from intrcnt[] and intrnames[] post-mortem. By scattering I/O interrupt counters elsewhere, it breaks this reporting functionality. Even if said feature is exceedingly niche, we can't just break userspace. It would be possible to collect interrupt counter statistics into some secondary array at panic-time, and have newer vmstat binaries read from there, but at that point I think the added complexity exceeds any tangible benefit.

I believe only sysctl("hw.intrnames") and sysctl("hw.intrcnt") are the only supported userspace interface. The use by vmstat is debugging functionality which needs to adapt to kernel changes, rather than trying to preserve kernel data structures to suit unusual userspace programs. Reading the git history of usr.bin/vmstat/vmstat.c is a long list of adjustments to adapt to kernel changes. For this portion in particular fd036deac16 was the last time it was changed and the change was incompatible with earlier. That suggests this is freely changeable, just vmstat needs to be updated.

I believe @jhb or @kib would have better insight. The complexity involved in maintaining this interface suggests to me it is time to go.

Everything split off, new state as single commit from series.

ehem_freebsd_m5p.com retitled this revision from intr: remove intrcnt/intrnames and move counters to intr_event to kern/intr: implement intrcnt/intrnames sysctl from event counters.Feb 8 2023, 9:20 PM
ehem_freebsd_m5p.com edited the summary of this revision. (Show Details)

Some of this network of commits could be useful without the core change. Check the "Stack" tab for the entire commit network.

sys/kern/kern_intr.c
1706

One item here, does this actually need atomic_load_long()? The counter could nominally be changing rapidly, so returning a snapshot is nice; however, the snapshot is immediately out of date.

1716

Same question as above.

I don't know what problem this is trying to solve and definitely don't want to read the entire review, so I have no further comments for the time being.

sys/kern/kern_intr.c
1391–1392

As I mentioned in another review atomics are incredibly costly are should be avoided if possible. Counting stuff is a prime example.

Bouncing cachelines to update things is also costly, which is why normally there is cpu-private fields guranteed to not false share with other cpus.

1668

while strictly speaking it is legal to hold sx locks across copyin/copyout, it is a bad idea

user memory has to be assumed to be backed with devices/filesystems with indefinite latency.

what you normally want to do is either malloc to have something to copy out (if small enough) or wire user pages, eagain before you take the event_lock.

ehem_freebsd_m5p.com added inline comments.
sys/kern/kern_intr.c
1668

I concede the copyout could be very slow. Good news is the lock is only needed exclusively for intr_event_create() and intr_event_destroy(). Most systems those will only be called during system start/shutdown when it isn't too likely for users to be present to try to retrieve this data. The type of system most likely to see those called during normal operation are VM control systems, which better not have many logged in users (as they are a sensitive area).

I halfway wondered whether it might be safe not to have the lock during this operation, but it isn't. The concern is of the entry being copied out being the one targeted for destruction or a torn read during the list modification. The primary defense here is all uses being fairly rare.

sys/x86/x86/intr_machdep.c
201–203

This was broken off into D37869, since this isn't strictly required for this task.

The end goal is getting INTRNG to be able to release interrupts. The goal of this series is to mostly remove intrcnt/intrnames as they're what cause trouble with releasing interrupts. For that matter the tables themselves are a layering violation and they force the architecture interrupt handling functionality to work around how they were designed.

The alternative D38437 has been proposed, but I'm not quite ready to give up on this approach (those tables are trouble).

sys/kern/kern_intr.c
1668

it's not about being fast or slow, but the fact that it can hang for an indefinite amount of time. trivial example: a funny user mmaps() an nfs-backed file and invokes the sysctl with that area as output. if, say, network goes down, this is going to sit there.

Again, this is a non-starter and is not hard to avoid.

sys/kern/kern_intr.c
1668

I've got no fundamental objections to either proposed alternative. Right now though I'm looking at the issue of updating vmstat to continue working its magic.

Ultimately this is a case of which failure mode is preferred. The interrupt list could be long (there was a mention of a board with 90K interrupts). Though we're no longer in the era when 256MB of memory was huge, thus now 4MB temporary isn't that awful.

sys/kern/kern_intr.c
1668

I should also point out intr_event_create() already has a malloc(,, M_WAITOK); so it can already hang for an indefinite amount of time.

sys/kern/subr_intr.c
149–150

I believe what could be claimed to be supported functionality would be vmstat -M kernelcore -i should work. Otherwise intrcnt looks very much like an albatross at this point and prevents any useful redesign work being done.

Updating in light of D38437. Updating based on inspiration from "hw.intrs", could be notably faster.

sys/kern/kern_intr.c
1668

I note this is fairly similar to the x86's implementation of sysctl("hw.intrs") (in sys/x86/x86/intr_machdep.c, function sysctl_hw_intrs()). That implementation uses a shared-exclusive lock in the exact same way. That also inspired a bit of change here to take care of a concern I already had.

ehem_freebsd_m5p.com marked an inline comment as done.

Hmm, check the source and seems I was wrong about how this worked. Update to fix issue.

This wasn't my goal, but it looks like this series, D38451 in particular fixes the intr_isrc_dispatch() call in sys/arm/mv/gpio.c. Since the counters get moved to the event, it ends up with functioning counters, plus no uninitialized counters get incremented (the function mv_gpio_intr_handler()).