lapic_handle_intr: KASSERT isrc != NULL
ClosedPublic
Actions

Authored by cperciva on Sat, Mar 14, 1:08 AM.

Details

Reviewers

kib
jhb
emaste

Commits

rGf350063a251c: clapic_handle_intr: KASSERT isrc != NULL

Summary

If an interrupt arrives at a CPU which isn't expecting that particular
vector, intr_lookup_source will return an isrc of NULL and we'll panic
when intr_execute_handlers increments *isrc->is_count.

Place a KASSERT a few nanoseconds earlier in order to leave some more
breadcrumbs for the next person to trip over this behaviour.

Tested on: EC2 r8i.96xlarge
MFC after: 1 month

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

cperciva created this revision.Sat, Mar 14, 1:08 AM

Herald added a subscriber: imp. · View Herald TranscriptSat, Mar 14, 1:08 AM

cperciva requested review of this revision.Sat, Mar 14, 1:08 AM

Harbormaster completed remote builds in B71395: Diff 173669.Sat, Mar 14, 1:08 AM

This is triggered by pointing an IRQ at lapic > 256, e.g. cpuset -l 193 -x 4 on an r8i.96xlarge system (note that on this system, CPUs 192--383 have APIC IDs 256--447).

I'll be making that not panic as a separate commit.

cperciva added a child revision: D55857: io_apic: Don't route to APIC ID > 255.Sat, Mar 14, 6:32 PM

kib added inline comments.Mon, Mar 16, 8:49 PM

sys/x86/x86/local_apic.c
1448	We don't ever assert that something is not NULL in kernel, relying on the hardware to trap. Then, wouldn't the right action there be only calling intr_execute_handlers() for isrc != NULL?

cperciva added inline comments.Mon, Mar 16, 9:11 PM

sys/x86/x86/local_apic.c
1448	We don't ever assert that something is not NULL in kernel, relying on the hardware to trap. Well, it does trap. But having this KASSERT with its useful error message would have saved me a while trying to figure out why the NULL dereference was happening. Then, wouldn't the right action there be only calling intr_execute_handlers() for isrc != NULL? Maybe, but I feel like interrupts being routed to a CPU which isn't expecting them is more of a "panic" issue than a "silently ignore" issue.

kib added inline comments.Mon, Mar 16, 10:00 PM

sys/x86/x86/local_apic.c
1448	Then perhaps the diagnostic should be printed in the place where we were unable to configure the interrupt source. The relevant ioapic function should propagate the error back, it should return an error instead of being void, perhaps?

cperciva added inline comments.Mon, Mar 16, 11:29 PM

sys/x86/x86/local_apic.c
1448	Well, the bug I tripped over was that we were configuring the interrupt source, but we were configuring it wrong -- sending interrupts to a different CPU from the one we were intending. I'm fixing that bug. I just wanted to leave some breadcrumbs in the form of a more useful message in case the same problem ever arises again.