Page MenuHomeFreeBSD

kern/intr: remove "irq" from kernel event API
Needs ReviewPublic

Authored by ehem_freebsd_m5p.com on Oct 15 2021, 6:35 AM.

Details

Reviewers
jhb
Summary

The concept of the IRQ isn't needed at this layer. The IRQ is something
which belongs to the particular hardware, not the innermost layer of the
kernel. Put another way intr_event is the device/interrupt source side
of the interrupt, "IRQs" are an interrupt controller concept.

This largely reverts commit 9b33b154b531606eccfc091faf50467eca0573f1.
Many places originally modified no longer exist and a number of new
locations have been added. This is still reverting a large portion of
that commit.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 42905
Build 39793: arc lint + arc unit

Event Timeline

I'm unsure whether to keep this as separate commits, versus pushing them to Phabricator as a blob. Since they're in my tree as individual chunks, they can be readily resubmitted individually if a reviewer thinks that is best. Phabricator seems rather weak at handling commit series.

Pulling in the entire intended chunk.

Consider this as a WIP suggestion. I fully expect changes.

The observation seems fairly straightforward. Having the irq in intr_event is just wrong. The kernel interrupt/event code doesn't have a table of interrupts and thus cannot lookup interrupts efficiently.

I came to this conclusion when thinking about how the software corresponds to hardware. struct intr_event roughly represents the interrupt source and the device which needs attention from the main processor. struct intr_irqsrc roughly represents a pin on an interrupt controller. This changes with MSIs, but most sources have only a single pin and have no concept of multiple interrupts; it is the controller which defines the interrupt numbering.

As such I believe it is the controller side (struct intr_irqsrc) which should care about irq numbering and the event core shouldn't touch them at all.

This is particularly important with hot-plug controllers, where a given device might get randomly swapped between interrupts. A real example of this is the Xen event channel where on suspend the entire interrupt mapping is lost and has to be recreated.

My apologies to @ae and linuxkpi for the difficult implementation, but upon examining the situation it just looked really wrong.

I should also note having the interrupt number in struct intr_event is low value since it simply duplicates other locations. Since kern_intr.c doesn't have a table, it cannot do lookups quickly unlike all other locations.

mhorne added subscribers: jhb, mhorne.

I think I agree with you that the concept of an IRQ does not really belong to this layer, so tracking it in struct intr_event is 'wrong' in an academic sense. That said, there is clearly some practical benefit to doing so, as this patch complicates several callers which are currently quite simple. Part of the problem is the fact that we use different interrupt frameworks on different architectures, but this is the current state of things.

Let me ask this: is it important that intr_lookup() be fast? If so, why?

It is also worth noting that - despite the comment in sys/interrupt.h - ie_irq does not always refer to a physical IRQ line. For INTRNG platforms, logical/virtual IRQs are handed out by the framework and they don't necessarily correspond to any real IRQ number as visible to the PIC(s) in the system.

Maybe @jhb can give some informed input.

sys/kern/kern_cpuset.c
2093

It is not acceptable to remove this functionality. This (and the hunk above) is what enables userspace to get/set interrupt affinity using cpuset(2).

Updating to hopefully improve things...

I think I agree with you that the concept of an IRQ does not really belong to this layer, so tracking it in struct intr_event is 'wrong' in an academic sense. That said, there is clearly some practical benefit to doing so, as this patch complicates several callers which are currently quite simple. Part of the problem is the fact that we use different interrupt frameworks on different architectures, but this is the current state of things.

Question is how best to address this situation. Perhaps the updated approach might be acceptable?

Let me ask this: is it important that intr_lookup() be fast? If so, why?

I guess I don't actually know. Both x86 and INTRNG use implementation which allow the operation to be fast. Whereas PowerPC is using an approach which will be slower.

Hopefully the update is a notable improvement. Perhaps adding compatibility will push the merging process along. Problem is it looks like not all MIPS devices use INTRNG and I'm unsure how to properly address those. The pointer to pointer return was chosen since that should be of value to hot-plug interrupt controllers (physical ones might be rare, but software/VM ones are common).

sys/kern/kern_cpuset.c
2093

I was worried about this. Upon examination though I realized I was mixing things up and hopefully the update addresses this in an acceptable way.

I'm removing mips in a few weeks. If there is a hassle due to mips, it won't be a hassle for long. Unless I've misread this, I don't see this change being MFCd.

Okay, that will simplify things. I don't see this being MFC'd either. I dislike breaking MIPS before then though.

hselasky added inline comments.
sys/dev/mlx5/mlx5_en/mlx5_en_main.c
38

These headers should be included by the LinuxKPI.

sys/sys/intr_compat.h
37

Should these two functions be put into a C-file, because not all kmods compiled outside the kernel tree carry the kernel options, thinking about INTRNG.

sys/dev/mlx5/mlx5_en/mlx5_en_main.c
38

Nevermind.

ehem_freebsd_m5p.com marked 2 inline comments as done.

Update to match comment.

My one concern is what effect will this have on attempting to unify interrupt systems on FreeBSD? Will this bring the goal closer (by starting on a hopefully acceptable API)? Will this push the goal away (by working around the immediate issue and relieving pressure)?

I hope it is the former, but it could be the latter. This should be adequate for the wacky idea which came to mind.

sys/sys/intr_compat.h
37

Good point.

The one which was changing has now been split into appropriate locations. The other is simply a macro which doesn't change. Hopefully this adjustment is acceptable.

ehem_freebsd_m5p.com marked an inline comment as done.

Fix style quirk.

The term IRQ is already in various other places (e.g. SYS_RES_IRQ) so it's not clear to me how this is a gain? It might be helpful to understand the larger context of why you want to do this? FWIW, I had wanted the cpuset for interrupts to take a string, not an integer ID and to do a substring match of the interrupt name.

sys/sys/intr_compat.h
36

Why does this return a struct intr_event ** instead of struct intr_event *? It seems like all the MD routines could just be a direct implementation of intr_get_event instead?

sys/sys/intr_compat.h
36

All implementations of the interrupt code already allow this level of access. So far I don't believe anyone has made use of it, but I see a potential use. Notably a dynamic interrupt controller might be implemented by swapping intr_event structures around. Right now this is merely a thought, but this actually seems like an elegant approach.

In D32504#734992, @jhb wrote:

The term IRQ is already in various other places (e.g. SYS_RES_IRQ) so it's not clear to me how this is a gain? It might be helpful to understand the larger context of why you want to do this? FWIW, I had wanted the cpuset for interrupts to take a string, not an integer ID and to do a substring match of the interrupt name.

I had hoped the summary covered that. Having ie_irq in struct intr_event is wrong. For all non-MIPS devices, the interrupt number is elsewhere and ie_irq is simply a redundant copy. For MIPS devices using INTRNG, INTRNG has the value and ie_irq is a redundant copy. Some MIPS devices have interrupt code which is sufficiently primitive it doesn't duplicate anything, but if MIPS is going away ie_irq is simply duplicating data held elsewhere.

A driver for a true dynamic/hotplug interrupt controller could implement interrupt changes by swapping struct intr_event pointers around. If such a controller ended up with the interrupt range 3971-8066 and during a suspend, controller interrupts #3 and #4 got swapped, this could be handled by simply swapping the intr_events for interrupts #3974 and #3975. Having ie_irq on the event makes this harder.