Page MenuHomeFreeBSD

lapic_handle_intr: KASSERT isrc != NULL
Needs ReviewPublic

Authored by cperciva on Sat, Mar 14, 1:08 AM.

Details

Reviewers
kib
jhb
emaste
Summary

If an interrupt arrives at a CPU which isn't expecting that particular
vector, intr_lookup_source will return an isrc of NULL and we'll panic
when intr_execute_handlers increments *isrc->is_count.

Place a KASSERT a few nanoseconds earlier in order to leave some more
breadcrumbs for the next person to trip over this behaviour.

Tested on: EC2 r8i.96xlarge
MFC after: 1 month

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 71395
Build 68278: arc lint + arc unit

Event Timeline

This is triggered by pointing an IRQ at lapic > 256, e.g. cpuset -l 193 -x 4 on an r8i.96xlarge system (note that on this system, CPUs 192--383 have APIC IDs 256--447).

I'll be making that not panic as a separate commit.

sys/x86/x86/local_apic.c
1448

We don't ever assert that something is not NULL in kernel, relying on the hardware to trap.

Then, wouldn't the right action there be only calling intr_execute_handlers() for isrc != NULL?

sys/x86/x86/local_apic.c
1448

We don't ever assert that something is not NULL in kernel, relying on the hardware to trap.

Well, it does trap. But having this KASSERT with its useful error message would have saved me a while trying to figure out why the NULL dereference was happening.

Then, wouldn't the right action there be only calling intr_execute_handlers() for isrc != NULL?

Maybe, but I feel like interrupts being routed to a CPU which isn't expecting them is more of a "panic" issue than a "silently ignore" issue.

sys/x86/x86/local_apic.c
1448

Then perhaps the diagnostic should be printed in the place where we were unable to configure the interrupt source.
The relevant ioapic function should propagate the error back, it should return an error instead of being void, perhaps?

sys/x86/x86/local_apic.c
1448

Well, the bug I tripped over was that we were configuring the interrupt source, but we were configuring it *wrong* -- sending interrupts to a different CPU from the one we were intending.

I'm fixing that bug. I just wanted to leave some breadcrumbs in the form of a more useful message in case the same problem ever arises again.