sys/kern/subr_intr.c
552	Presumably we shouldn't do this if `error != 0`? We have isrc_event_destroy(), but it is commented out with an `#ifdef notyet`. Seeing that this function isn't called, though, I think it'd be ok to merge the two. It looks like intr_isrc_deregister() should also clear the `isrc_event` pointer.

ehem_freebsd_m5p.com added inline comments.Feb 15 2023, 1:28 AM

sys/kern/subr_intr.c
552	Huh, I may have noticed that function before, but my eyes glazed over. Should the event be destroyed before removing from the interrupt table? Should the event be destroyed after removing from interrupt table? Both of the steps can fail, though for distinct reasons. Whichever is done first needs to succeed if passed a potentially released isrc. Notably `isrc_free_irq()` would need to return 0 if `isrc->isrc_irq >= intr_nirq` (since its interrupt has been freed, just it was already done).

Updating to my present tree status. Problem is I see a few ways to go with this and I need advice on which to choose.

Harbormaster completed remote builds in B50667: Diff 119637.Mar 29 2023, 7:01 PM

I'm looking at several issues surrounding D38599 and I'm unsure which direction to go with them.

First, two things are being released. The interrupt table entry (isrc_free_irq()) and the event (intr_event_destroy()). I'm unsure which should be released first. There seems a good argument for releasing the event first since this is the reverse order of allocation (unfortunately what is presently here does the reverse).

Second, this means all the release functions need to return success if nothing is done. D38602 does this for intr_event_destroy(), but there are two failure paths in isrc_free_irq() which need addressing. I've got one of those here, but what of the second? I suspect it is better to use

if (irq_source[isrc->isrc_irq] == isrc)
    irq_source[isrc->isrc_irq] = NULL;

Since that recovers from ->isrc_irq getting garbled (or the table entry being garbled). This though brings the question, should this be another differential? Should this be part of D38599?

With no activity, hopefully doing an update will provoke activity.

Most of the kernel appears to treat NULL as unallocated and not an error. Since this could fail for two reasons, I've decided ->isrc_irq being invalid simply indicates the table entry was released and is not an error. This means if a release fails, intr_isrc_deregister() can be called again and may succeed.

Harbormaster completed remote builds in B52111: Diff 123356.Jun 17 2023, 12:14 AM

In D38599#923991, @ehem_freebsd_m5p.com wrote:

With no activity, hopefully doing an update will provoke activity.

Most of the kernel appears to treat NULL as unallocated and not an error.

Are you referring to functions like free()? It depends. free() is called very frequently in error handling paths, so having free(NULL) be a no-op is a convenience. Most rarely used allocators do not have such an affordance. uma_zdestroy(NULL) is a bug, as is kmem_free(NULL). I think this is the right tradeoff, since the permissive behaviour can hide bugs, and calls to free() or m_free() are a lot more common than uma_zdestroy() or kmem_free().

Since this could fail for two reasons, I've decided ->isrc_irq being invalid simply indicates the table entry was released and is not an error. This means if a release fails, intr_isrc_deregister() can be called again and may succeed.

But when is this behaviour useful? Without some concrete example demonstrating otherwise, I'd presume that isrc_free_irq() or intr_event_destroy() fails only because of a bug in the consumer, in which case it seems preferable to fail explicitly.

In D38599#926723, @markj wrote:

In D38599#923991, @ehem_freebsd_m5p.com wrote:

With no activity, hopefully doing an update will provoke activity.

Most of the kernel appears to treat NULL as unallocated and not an error.

Are you referring to functions like free()? It depends. free() is called very frequently in error handling paths, so having free(NULL) be a no-op is a convenience. Most rarely used allocators do not have such an affordance. uma_zdestroy(NULL) is a bug, as is kmem_free(NULL). I think this is the right tradeoff, since the permissive behaviour can hide bugs, and calls to free() or m_free() are a lot more common than uma_zdestroy() or kmem_free().

Indeed. This is more relevant to D38602 though. The issue effecting D38599 is should intr_isrc_deregister() check isrc->isrc_event != NULL before calling intr_event_destroy(isrc->isrc_event)?

If intr_event_destroy(NULL); is not an error, then intr_isrc_deregister() can unconditionally call it. If that is an error, then intr_isrc_deregister() needs to check for NULL and then intr_event_destroy() repeats exactly the same check.

The concern is intr_event_destroy() returns EBUSY if a handler is still present; a plausible recovery for intr_event_destroy() returning any error is to wait for 1 second then call intr_event_destroy() again, yet if intr_event_destroy(NULL); produces an error, this will never recover. Perhaps intr_event_destroy(NULL); should produce a warning, but returning an error indication seems unlikely to be useful.

I can see going either way, but my initial inclination is for intr_event_destroy(NULL); not to return an error.

In D38599#926723, @markj wrote:

In D38599#923991, @ehem_freebsd_m5p.com wrote:

Since this could fail for two reasons, I've decided ->isrc_irq being invalid simply indicates the table entry was released and is not an error. This means if a release fails, intr_isrc_deregister() can be called again and may succeed.

But when is this behaviour useful? Without some concrete example demonstrating otherwise, I'd presume that isrc_free_irq() or intr_event_destroy() fails only because of a bug in the consumer, in which case it seems preferable to fail explicitly.

With the present implementation, isrc_free_irq() could be called and return successfully. intr_event_destroy() could then return EBUSY and a plausible recovery would be to add the entry to a list for calling intr_isrc_deregister() later (recovering from a race between handler removal and source release). Without this change a second call to isrc_free_irq() would produce an error since the entry had already been removed from the table and recovery would be impossible.

As such, whichever step is done first must be successful if the step was already accomplished.

In D38599#926782, @ehem_freebsd_m5p.com wrote:

In D38599#926723, @markj wrote:

In D38599#923991, @ehem_freebsd_m5p.com wrote:

With no activity, hopefully doing an update will provoke activity.

Most of the kernel appears to treat NULL as unallocated and not an error.

Are you referring to functions like free()? It depends. free() is called very frequently in error handling paths, so having free(NULL) be a no-op is a convenience. Most rarely used allocators do not have such an affordance. uma_zdestroy(NULL) is a bug, as is kmem_free(NULL). I think this is the right tradeoff, since the permissive behaviour can hide bugs, and calls to free() or m_free() are a lot more common than uma_zdestroy() or kmem_free().

Indeed. This is more relevant to D38602 though. The issue effecting D38599 is should intr_isrc_deregister() check isrc->isrc_event != NULL before calling intr_event_destroy(isrc->isrc_event)?

If intr_event_destroy(NULL); is not an error, then intr_isrc_deregister() can unconditionally call it. If that is an error, then intr_isrc_deregister() needs to check for NULL and then intr_event_destroy() repeats exactly the same check.

The concern is intr_event_destroy() returns EBUSY if a handler is still present; a plausible recovery for intr_event_destroy() returning any error is to wait for 1 second then call intr_event_destroy() again, yet if intr_event_destroy(NULL); produces an error, this will never recover. Perhaps intr_event_destroy(NULL); should produce a warning, but returning an error indication seems unlikely to be useful.

I do not think this is a plausible recovery path for FreeBSD. We usually do not take any great pain to avoid panicking when invariants fail to hold; if there is a race between handler deregistration and event source destruction, it is better to fix the race than to add workarounds for it. There are certainly reasonable counterexamples to this, but the scenario you're describing is so far hypothetical to me.

I can see going either way, but my initial inclination is for intr_event_destroy(NULL); not to return an error.

In D38599#926723, @markj wrote:

In D38599#923991, @ehem_freebsd_m5p.com wrote:

Since this could fail for two reasons, I've decided ->isrc_irq being invalid simply indicates the table entry was released and is not an error. This means if a release fails, intr_isrc_deregister() can be called again and may succeed.

But when is this behaviour useful? Without some concrete example demonstrating otherwise, I'd presume that isrc_free_irq() or intr_event_destroy() fails only because of a bug in the consumer, in which case it seems preferable to fail explicitly.

With the present implementation, isrc_free_irq() could be called and return successfully. intr_event_destroy() could then return EBUSY and a plausible recovery would be to add the entry to a list for calling intr_isrc_deregister() later (recovering from a race between handler removal and source release). Without this change a second call to isrc_free_irq() would produce an error since the entry had already been removed from the table and recovery would be impossible.

As such, whichever step is done first must be successful if the step was already accomplished.

Absent some examples grounded in a specific driver to motivate the discussion, my view is that it's better not to bother thinking about this at all. If intr_event_destroy() returns EBUSY in a particular scenario, then the caller should be fixed.

markj mentioned this in D38602: kern/intr: have intr_event_destroy() return success on NULL event.Aug 16 2023, 9:35 PM

In D38599#944958, @markj wrote:

Absent some examples grounded in a specific driver to motivate the discussion, my view is that it's better not to bother thinking about this at all. If intr_event_destroy() returns EBUSY in a particular scenario, then the caller should be fixed.

Switching to e-mail since that is a larger thread. Basically an almost ready driver calls intr_deregister_source() in a non-error path. Therefore I'm examining intr_deregister_source() and finding bugs, then finding more bugs and yet more bugs. Clearly it hasn't been well-exercised before.

All of these should be obvious from what INTRNG does. Failing that, an example was provided. So, a few call sequences:

intr_isrc_register()
intr_setup_irq()
intr_teardown_irq()
intr_isrc_deregister()

Unless the caller manually calls intr_event_destroy(isrc->isrc_event), a struct intr_event will be leaked. INTRNG allocates the struct intr_event, but lacks any provision to release them.

intr_isrc_register()
intr_setup_irq()
intr_isrc_deregister()
intr_teardown_irq()
intr_isrc_deregister()

Sure, this is a questionable call sequence. A multi-threaded device driver could do this if different threads handled different bits. Problem is during the second intr_isrc_deregister(), isrc_free_irq() will return EINVAL since it has already been removed from the table. This would block intr_event_destroy() being called, as such intr_free_irq() needs to produce a successful return before that call. Yet despite everything having been properly released, the functions still return errors.

intr_isrc_register()
intr_isrc_deregister()

INTRNG presently does lazy allocation of ->isrc_event. As such intr_event_destroy() will return EINVAL and we've triggered D38602. The issue for D38602 is quite simply, the caller really wants to know "has everything been properly cleaned up?" A NULL event isn't a proper state, but it does suggest everything has been cleaned up.

Presently the implementation I've got will log an error and drop everything on the floor. Better to leak memory, but remain mostly operational than panic() for something which will take a while to become truly problematic.