Page MenuHomeFreeBSD

kern/intr: switch to allocating handles forward
AbandonedPublic

Authored by ehem_freebsd_m5p.com on Mar 17 2021, 5:00 AM.

Details

Summary

isrc_alloc_irq() had been allocating handles in order and scanning the
whole table before failing. In case of failure irq_next_free was set to
maxirqs to avoid repeated full scans of the table. Unfortunately
isrc_free_irq() needed to reset irq_next_free and did not, hence once
full allocations were permanently short-circuited.

As the comment before isrc_alloc_irq() mentions, there is potential for
a slow allocation attempt loop failure scenario. In this light, simply
modify irq_next_free to always indicate the lowest available interrupt.
Since irq_next_free can still equal intr_nirq the fast fail scenario
still functions.

A driver which tries to allocate two interrupts when only one is
available (and frees the first during failure recovery) will be much
faster since only part of the table will be scanned.

This also simplifies the source code.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 39084
Build 35973: arc lint + arc unit

Event Timeline

"Hey! Look what I found when I happened to be looking at subr_intr.c!"

While this might not occur that often, locking all future interrupt allocations strikes me as a Bad Thing(tm).

Add a log warning for when an allocation fails due to full table. This
limits log messages by only logging the first failure and not
subsequent ones. Though if entries are freed then the message can occur
again.

This bug appears to be present in most post-2016 versions of FreeBSD. Notably it is present in 686450c8984c60f2c21eb7c64afe0eac72244db0, when ARM_INITRG was initially imported and the functions were isrc_alloc_irq_locked() and isrc_free_irq(). As such this appears applicable to all currently active branches of FreeBSD.

I don't know how often this gets triggered though. I spotted it due to looking at subr_intr.c since I need to make use of some functionality and that was a handy way to figure out what I needed. Could be this rarely occurs in the field or could be this occurs fairly frequently, but tends to manifest as mysterious crashes which don't show quite often enough to cause that much trouble.

mmel requested changes to this revision.Mar 18 2021, 12:28 PM
mmel added inline comments.
sys/kern/subr_intr.c
413

log () is not the standard way to log kernel problems. We use printf() or device_printf() for this. In addition, in this case, I think that the requester is responsible for reporting the problem.

438

This is wrong. the irq_next_free is used as start index of full table lookup -> see second cycle in isrc_alloc_irq() and (mainly) a comment above this function . This patch violates assumption in mentioned comment. But I don't see where exactly is problem.

This revision now requires changes to proceed.Mar 18 2021, 12:28 PM

I'm up for the requester being responsible for reporting. The olde
"report whenever a problem occurs" instinct was kicking in.

I keep waving back and forth as to whether it is better to always set irq_next_free to isrc->isrc_irq, versus doing so conditionally. Right now I'm finding myself favoring unconditional, but it likely doesn't make too much difference.

maxirqs really needs to be ripped out of isrc_alloc_irq(). Too easy to get confused and grab maxirqs instead of the actual file variable intr_nirq.

sys/kern/subr_intr.c
438

You're missing what occurs in the table of interrupts fills. If isrc_alloc_irq() finds the table is presently full (it falls through the second loop), then it will set:

irq_next_free = maxirqs;

This in turn causes isrc_alloc_irq() to exit early at line 409. This means the table will never be rescanned and isrc_alloc_irq() will return ENOSPC no matter how many interrupts are released.

If irq_next_free >= intr_nirq, then the table is presently known to be full and scanning is currently disabled. As such setting irq_next_free to something less than intr_nirq is required to restart scanning. The value isrc->isrc_irq is appropriate since it is a now known to be available and the first entry the table scan looks at will be available.

I can even argue for the approach of unconditionally setting irq_next_free = isrc->isrc_irq, since it may make isrc_alloc_irq() faster by guaranteeing irq_next_free is presently pointing to an available entry. This avoids a branch and read from memory. On the flip side this potentially makes allocation order a bit interesting.

I don't see a problem with the patch in theory, but it seems impossible to usefully trigger this condition. If the system has truly exhausted its list of irqs, then you have a problem whether irq allocations can continue or not.

sys/kern/subr_intr.c
435

Existing comments in the file are, for the most part, capitalized and end with a period.

sys/kern/subr_intr.c
438

You're missing what occurs in the table of interrupts fills. If isrc_alloc_irq() finds the table is presently full (it falls through the second loop), then it will set:

irq_next_free = maxirqs;

This in turn causes isrc_alloc_irq() to exit early at line 409. This means the table will never be rescanned and isrc_alloc_irq() will return ENOSPC no matter how many interrupts are released.

Ahh, yes, sure. I overlooked the 'if' condition on line 409, sorry for this.

I can even argue for the approach of unconditionally setting irq_next_free = isrc->isrc_irq, since it may make isrc_alloc_irq() faster by guaranteeing irq_next_free is presently pointing to an available entry. This avoids a branch and read from memory. On the flip side this potentially makes allocation order a bit interesting.

I strongly disagree with that. The original goal of the proposal is to detach the allocation order from the release order. For me, it is part of the (implicit) security of standard code, although we are talking more or less about a hypothetical situation. Also this code is not on hot-path.

441

What about this?

	/*
	 * If we are recovering from a full isrc table state, next allocation
	 * should perform a full table scan. This ensures that order of
	 * allocations is separated from the release sequences.
	 */
	if (irq_next_free >= intr_nirq)
		irq_next_free = 0;

The argument for always setting irq_next_free is it guarantees the next call to isrc_alloc_irq() will terminate quickly. The odds of the entry currently designated by irq_next_free being empty is a function of how full the table is.

For ideal performance, we really want irq_next_free to be left at the start of a large block of free entries. This would have the next several calls to isrc_alloc_irq() return quickly. Issue is, do entries tend to be deallocated in same order as allocation, or the reverse order?

I suspect it is likely reverse order, in which case always setting irq_next_free is the way to go as it significantly reduces time spent scanning the table.

sys/kern/subr_intr.c
438

I can even argue for the approach of unconditionally setting irq_next_free = isrc->isrc_irq, since it may make isrc_alloc_irq() faster by guaranteeing irq_next_free is presently pointing to an available entry. This avoids a branch and read from memory. On the flip side this potentially makes allocation order a bit interesting.

I strongly disagree with that. The original goal of the proposal is to detach the allocation order from the release order. For me, it is part of the (implicit) security of standard code, although we are talking more or less about a hypothetical situation. Also this code is not on hot-path.

I'll concede this shouldn't be hot-path, but that is partially due to the original authors paying attention to avoid performance issues. The very existence of irq_next_free points to them having paid attention. Instead of always scanning from the start, it scans from where it last left off. Instead of scanning the full table before determining it is full, it records that it is full.

441

If the table was full, then our newly freed entry will be the only available one. The next call to isrc_alloc_irq() will find it and terminate quickly. Setting to zero means the next isrc_alloc_irq() call will scan more than half the table, since the low-numbered entries are occupied by permanently allocated entries and then it will on average be halfway along (setting to intr_nirq/2 would likely offer measurably better performance).

ehem_freebsd_m5p.com marked an inline comment as done.

Addressing @mhorne's comment and bringing up a fourth potential option.

Adjusted to take care of one comment.

I'm fearful this is turning into a "bike shed" situation. There are four worthwhile approaches I see:

option 0:

	if (irq_next_free >= intr_nirq)
                 irq_next_free = isrc->isrc_irq;

option 1:

	if (irq_next_free >= intr_nirq)
		irq_next_free = 0;

option 2:

	irq_next_free = isrc->isrc_irq;

option 3:

	if (irq_next_free > isrc->isrc_irq)
                 irq_next_free = isrc->isrc_irq;

I find myself strongly against option 1. True this isn't hot-path, but a free shouldn't cause the next allocation to scan more than 50% of the table to find the single free entry (the first few entries will be static allocations, dynamic are likely to be later), when the single open entry is known to this code.

Option 0 versus option 2 is a bit more interesting. Question comes, do interrupts tend to be freed in the order they were allocated? Do they tend to be free in reverse of the order they were allocated? If the latter pattern is more common then option 2 is better, since the last interrupt to be freed will be the earliest and irq_next_free will tend to be left at the start of a block.

Option 3 has rather more dramatic effects. Option 3 would always keep irq_next_free indicating the first free entry in the table. In such case isrc_alloc_irq() no longer needs to scan the entire table to determine it is full. Simply scanning from irq_next_free to the end will be sufficient and the ENOSPC situation can be found earlier. This makes isrc_alloc_irq() shorter and simpler.

Since I also need to address @mhorne's comment, I'm submitting option 3. This may in fact be superior.

The rendering of that diff is hard to understand. Alas, that comment will now need adjustment. "Keep irq_next_free indicating first free entry."?

Ping.

Is an option besides 1 acceptable to @mmel? I do see some advantage to option 3, as it actually results in simplifying the code in isrc_alloc_irq().

EDIT: 0-based vs 1-based makes a difference.

sys/kern/subr_intr.c
405–406

I think there is no point in having maxirqs, just reference intr_nirq directly.

438

The performance of isrc_alloc_irq() after the table is full is not important, and arguably not important at all. It is better to do the simpler thing, which is to decouple isrc_alloc_irq() and isrc_free_irq() by having isrc_alloc_irq() do a full scan of the table after the cursor has wrapped around. In other words, I think isrc_alloc_irq() should simply set irq_next_free to 0 before returning ENOSPC, and isrc_free_irq() should not change at all.

sys/kern/subr_intr.c
405–406

There is a subtle gotcha here, pointed to by D29327. maxirqs is unsigned intr_nirq is signed. While it is unlikely for intr_nirq to be large enough for this to matter it could. In fact INTR_IRQ_INVALID is -1 with intr_nirq being signed.

Having typed that, what is presently in my development tree nukes maxirqs. I've been waiting for comments before finishing cleanup and updating D29310.

Now with a much larger commit message. Notably this approach distinctly shrinks isrc_alloc_irq(). I like preserving the features which enhance performance. I'll admit some of the simplification is due to D29327 making maxirqs completely unnecessary (its purpose was to provide an unsigned version of intr_nirq).

ehem_freebsd_m5p.com retitled this revision from kern/intr: restart interrupt allocations after free to kern/intr: switch to allocating handles forward.May 31 2021, 3:02 AM
ehem_freebsd_m5p.com edited the summary of this revision. (Show Details)

Any comments on what is currently present?

The approach of always allocating lowest first seems boring, but it is boring since it is quite effective and addresses the need perfectly. Does mean some test cases are harder to reach, but every approach is going to make some cases harder to test and other cases easier.

Any comments on what is currently present?

The approach of always allocating lowest first seems boring, but it is boring since it is quite effective and addresses the need perfectly. Does mean some test cases are harder to reach, but every approach is going to make some cases harder to test and other cases easier.

Do you have some new code that exercises these paths?

I still don't understand the emphasis on performance here. Allocation of irqs is very infrequent, and deallocation even more so. Any discussion of irq alloc/free patterns is mainly speculation until drivers start using intr_isrc_deregister().

IMO the original version of the patch was better, since it did only one thing: allow recovery from a full table without changing isrc_alloc_irq() at all. Removing maxirqs is a fine cleanup but orthogonal to the change itself.