Details

Reviewers

mhorne
gonzo
markj
kib
mmel
skra

Summary

isrc_alloc_irq() had been allocating handles in order and scanning the
whole table before failing. In case of failure irq_next_free was set to
maxirqs to avoid repeated full scans of the table. Unfortunately
isrc_free_irq() needed to reset irq_next_free and did not, hence once
full allocations were permanently short-circuited.

As the comment before isrc_alloc_irq() mentions, there is potential for
a slow allocation attempt loop failure scenario. In this light, simply
modify irq_next_free to always indicate the lowest available interrupt.
Since irq_next_free can still equal intr_nirq the fast fail scenario
still functions.

A driver which tries to allocate two interrupts when only one is
available (and frees the first during failure recovery) will be much
faster since only part of the table will be scanned.

This also simplifies the source code.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 39084
Build 35973: arc lint + arc unit

Event Timeline

ehem_freebsd_m5p.com created this revision.Mar 17 2021, 5:00 AM

Herald added subscribers: Contributor Reviews (src), imp. · View Herald TranscriptMar 17 2021, 5:00 AM

ehem_freebsd_m5p.com requested review of this revision.Mar 17 2021, 5:00 AM

Harbormaster completed remote builds in B37892: Diff 85878.Mar 17 2021, 5:00 AM

"Hey! Look what I found when I happened to be looking at subr_intr.c!"

While this might not occur that often, locking all future interrupt allocations strikes me as a Bad Thing(tm).

ehem_freebsd_m5p.com added reviewers: mmel, skra.Mar 17 2021, 5:34 AM

ehem_freebsd_m5p.com awarded a token.Mar 17 2021, 6:00 AM

ehem_freebsd_m5p.com rescinded a token.

Add a log warning for when an allocation fails due to full table. This
limits log messages by only logging the first failure and not
subsequent ones. Though if entries are freed then the message can occur
again.

Harbormaster completed remote builds in B37898: Diff 85917.Mar 17 2021, 3:42 PM

This bug appears to be present in most post-2016 versions of FreeBSD. Notably it is present in 686450c8984c60f2c21eb7c64afe0eac72244db0, when ARM_INITRG was initially imported and the functions were isrc_alloc_irq_locked() and isrc_free_irq(). As such this appears applicable to all currently active branches of FreeBSD.

I don't know how often this gets triggered though. I spotted it due to looking at subr_intr.c since I need to make use of some functionality and that was a handy way to figure out what I needed. Could be this rarely occurs in the field or could be this occurs fairly frequently, but tends to manifest as mysterious crashes which don't show quite often enough to cause that much trouble.

ehem_freebsd_m5p.com mentioned this in D2047: New ARM interrupt framework.Mar 17 2021, 4:11 PM

mmel requested changes to this revision.Mar 18 2021, 12:28 PM

mmel added inline comments.

sys/kern/subr_intr.c
414	log () is not the standard way to log kernel problems. We use printf() or device_printf() for this. In addition, in this case, I think that the requester is responsible for reporting the problem.
439	This is wrong. the irq_next_free is used as start index of full table lookup -> see second cycle in isrc_alloc_irq() and (mainly) a comment above this function . This patch violates assumption in mentioned comment. But I don't see where exactly is problem.

This revision now requires changes to proceed.Mar 18 2021, 12:28 PM

I'm up for the requester being responsible for reporting. The olde
"report whenever a problem occurs" instinct was kicking in.

Harbormaster completed remote builds in B37913: Diff 85964.Mar 18 2021, 2:59 PM

I keep waving back and forth as to whether it is better to always set irq_next_free to isrc->isrc_irq, versus doing so conditionally. Right now I'm finding myself favoring unconditional, but it likely doesn't make too much difference.

maxirqs really needs to be ripped out of isrc_alloc_irq(). Too easy to get confused and grab maxirqs instead of the actual file variable intr_nirq.

sys/kern/subr_intr.c
439	You're missing what occurs in the table of interrupts fills. If isrc_alloc_irq() finds the table is presently full (it falls through the second loop), then it will set: irq_next_free = maxirqs; This in turn causes isrc_alloc_irq() to exit early at line 409. This means the table will never be rescanned and isrc_alloc_irq() will return ENOSPC no matter how many interrupts are released. If irq_next_free >= intr_nirq, then the table is presently known to be full and scanning is currently disabled. As such setting irq_next_free to something less than intr_nirq is required to restart scanning. The value isrc->isrc_irq is appropriate since it is a now known to be available and the first entry the table scan looks at will be available. I can even argue for the approach of unconditionally setting irq_next_free = isrc->isrc_irq, since it may make isrc_alloc_irq() faster by guaranteeing irq_next_free is presently pointing to an available entry. This avoids a branch and read from memory. On the flip side this potentially makes allocation order a bit interesting.

I don't see a problem with the patch in theory, but it seems impossible to usefully trigger this condition. If the system has truly exhausted its list of irqs, then you have a problem whether irq allocations can continue or not.

sys/kern/subr_intr.c
436	Existing comments in the file are, for the most part, capitalized and end with a period.

mmel added inline comments.Mar 19 2021, 9:58 AM

sys/kern/subr_intr.c
439	You're missing what occurs in the table of interrupts fills. If isrc_alloc_irq() finds the table is presently full (it falls through the second loop), then it will set: irq_next_free = maxirqs; This in turn causes isrc_alloc_irq() to exit early at line 409. This means the table will never be rescanned and isrc_alloc_irq() will return ENOSPC no matter how many interrupts are released. Ahh, yes, sure. I overlooked the 'if' condition on line 409, sorry for this. I can even argue for the approach of unconditionally setting irq_next_free = isrc->isrc_irq, since it may make isrc_alloc_irq() faster by guaranteeing irq_next_free is presently pointing to an available entry. This avoids a branch and read from memory. On the flip side this potentially makes allocation order a bit interesting. I strongly disagree with that. The original goal of the proposal is to detach the allocation order from the release order. For me, it is part of the (implicit) security of standard code, although we are talking more or less about a hypothetical situation. Also this code is not on hot-path.
441	What about this? /* * If we are recovering from a full isrc table state, next allocation * should perform a full table scan. This ensures that order of * allocations is separated from the release sequences. */ if (irq_next_free >= intr_nirq) irq_next_free = 0;

The argument for always setting irq_next_free is it guarantees the next call to isrc_alloc_irq() will terminate quickly. The odds of the entry currently designated by irq_next_free being empty is a function of how full the table is.

For ideal performance, we really want irq_next_free to be left at the start of a large block of free entries. This would have the next several calls to isrc_alloc_irq() return quickly. Issue is, do entries tend to be deallocated in same order as allocation, or the reverse order?

I suspect it is likely reverse order, in which case always setting irq_next_free is the way to go as it significantly reduces time spent scanning the table.

sys/kern/subr_intr.c
439	I can even argue for the approach of unconditionally setting irq_next_free = isrc->isrc_irq, since it may make isrc_alloc_irq() faster by guaranteeing irq_next_free is presently pointing to an available entry. This avoids a branch and read from memory. On the flip side this potentially makes allocation order a bit interesting. I strongly disagree with that. The original goal of the proposal is to detach the allocation order from the release order. For me, it is part of the (implicit) security of standard code, although we are talking more or less about a hypothetical situation. Also this code is not on hot-path. I'll concede this shouldn't be hot-path, but that is partially due to the original authors paying attention to avoid performance issues. The very existence of irq_next_free points to them having paid attention. Instead of always scanning from the start, it scans from where it last left off. Instead of scanning the full table before determining it is full, it records that it is full.
441	If the table was full, then our newly freed entry will be the only available one. The next call to isrc_alloc_irq() will find it and terminate quickly. Setting to zero means the next isrc_alloc_irq() call will scan more than half the table, since the low-numbered entries are occupied by permanently allocated entries and then it will on average be halfway along (setting to intr_nirq/2 would likely offer measurably better performance).

ehem_freebsd_m5p.com mentioned this in D29327: kern/intr: declare interrupt vectors unsigned.Mar 19 2021, 9:34 PM

Addressing @mhorne's comment and bringing up a fourth potential option.

Harbormaster completed remote builds in B38329: Diff 86854.Apr 5 2021, 6:22 PM

Adjusted to take care of one comment.

I'm fearful this is turning into a "bike shed" situation. There are four worthwhile approaches I see:

option 0:

	if (irq_next_free >= intr_nirq)
                 irq_next_free = isrc->isrc_irq;

option 1:

	if (irq_next_free >= intr_nirq)
		irq_next_free = 0;

option 2:

	irq_next_free = isrc->isrc_irq;

option 3:

	if (irq_next_free > isrc->isrc_irq)
                 irq_next_free = isrc->isrc_irq;

I find myself strongly against option 1. True this isn't hot-path, but a free shouldn't cause the next allocation to scan more than 50% of the table to find the single free entry (the first few entries will be static allocations, dynamic are likely to be later), when the single open entry is known to this code.

Option 0 versus option 2 is a bit more interesting. Question comes, do interrupts tend to be freed in the order they were allocated? Do they tend to be free in reverse of the order they were allocated? If the latter pattern is more common then option 2 is better, since the last interrupt to be freed will be the earliest and irq_next_free will tend to be left at the start of a block.

Option 3 has rather more dramatic effects. Option 3 would always keep irq_next_free indicating the first free entry in the table. In such case isrc_alloc_irq() no longer needs to scan the entire table to determine it is full. Simply scanning from irq_next_free to the end will be sufficient and the ENOSPC situation can be found earlier. This makes isrc_alloc_irq() shorter and simpler.

Since I also need to address @mhorne's comment, I'm submitting option 3. This may in fact be superior.

The rendering of that diff is hard to understand. Alas, that comment will now need adjustment. "Keep irq_next_free indicating first free entry."?

Ping.

Is an option besides 1 acceptable to @mmel? I do see some advantage to option 3, as it actually results in simplifying the code in isrc_alloc_irq().

EDIT: 0-based vs 1-based makes a difference.

markj added inline comments.Apr 19 2021, 9:18 PM

sys/kern/subr_intr.c
405–406	I think there is no point in having `maxirqs`, just reference `intr_nirq` directly.
439	The performance of isrc_alloc_irq() after the table is full is not important, and arguably not important at all. It is better to do the simpler thing, which is to decouple isrc_alloc_irq() and isrc_free_irq() by having isrc_alloc_irq() do a full scan of the table after the cursor has wrapped around. In other words, I think isrc_alloc_irq() should simply set irq_next_free to 0 before returning ENOSPC, and isrc_free_irq() should not change at all.

ehem_freebsd_m5p.com added inline comments.Apr 20 2021, 3:19 AM

sys/kern/subr_intr.c
405–406	There is a subtle gotcha here, pointed to by D29327. `maxirqs` is unsigned `intr_nirq` is signed. While it is unlikely for `intr_nirq` to be large enough for this to matter it could. In fact INTR_IRQ_INVALID is -1 with `intr_nirq` being signed. Having typed that, what is presently in my development tree nukes maxirqs. I've been waiting for comments before finishing cleanup and updating D29310.

Now with a much larger commit message. Notably this approach distinctly shrinks isrc_alloc_irq(). I like preserving the features which enhance performance. I'll admit some of the simplification is due to D29327 making maxirqs completely unnecessary (its purpose was to provide an unsigned version of intr_nirq).

Harbormaster completed remote builds in B39084: Diff 88884.May 8 2021, 5:26 AM

ehem_freebsd_m5p.com retitled this revision from kern/intr: restart interrupt allocations after free to kern/intr: switch to allocating handles forward.May 31 2021, 3:02 AM

ehem_freebsd_m5p.com edited the summary of this revision. (Show Details)

Any comments on what is currently present?

The approach of always allocating lowest first seems boring, but it is boring since it is quite effective and addresses the need perfectly. Does mean some test cases are harder to reach, but every approach is going to make some cases harder to test and other cases easier.

In D29310#688060, @ehem_freebsd_m5p.com wrote:

Any comments on what is currently present?

The approach of always allocating lowest first seems boring, but it is boring since it is quite effective and addresses the need perfectly. Does mean some test cases are harder to reach, but every approach is going to make some cases harder to test and other cases easier.

Do you have some new code that exercises these paths?

I still don't understand the emphasis on performance here. Allocation of irqs is very infrequent, and deallocation even more so. Any discussion of irq alloc/free patterns is mainly speculation until drivers start using intr_isrc_deregister().

IMO the original version of the patch was better, since it did only one thing: allow recovery from a full table without changing isrc_alloc_irq() at all. Removing maxirqs is a fine cleanup but orthogonal to the change itself.

mmel mentioned this in rGa49f208d94b8: intrng: Releasing interrupt source should clear interrupt table full state..Jul 8 2021, 6:45 AM

mmel mentioned this in rGe88c3b1b02a6: intrng: remove now redundant shadow variable..Jul 8 2021, 6:58 AM

ehem_freebsd_m5p.com abandoned this revision.Aug 29 2021, 11:01 PM

mmel mentioned this in rGa3ccd06dd965: intrng: Releasing interrupt source should clear interrupt table full state..Jan 20 2022, 10:09 AM

mmel mentioned this in rG2ace1585b0f6: intrng: remove now redundant shadow variable..